Checking for duplicates in a List of Objects C# - linq

I am looking for a really fast way to check for duplicates in a list of objects.
I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...
Suppose I have an object...
public class dupeCheckee
{
public string checkThis { get; set; }
public string checkThat { get; set; }
dupeCheckee(string val, string val2)
{
checkThis = val;
checkThat = val2;
}
}
And I have a list of those objects
List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe...
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe
I need to find the dupes in that list. When I find it, I need to do some additional logic
not necessarily removing them.
When I use linq some how my GroupBy is throwing an exception...
'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?)
Which is telling me that I am missing a library. I am having a hard time figuring out which one though.
Once I figure that out though, How would I essentially check for those two conditions...
IE checkThis and checkThat both occur more than once?
UPDATE: What I came up with
This is the linq query that I came up with after doing quick research...
test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count()
I am not certain if this is definitely better than this answer...
var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any());
I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...
The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....
Dupes:
List<DupeCheckee> test = new List<DupeCheckee>{
new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}
};
No dupes...
List<DupeCheckee> test2 = new List<DupeCheckee>{
new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}
};

You need to reference System.Linq (e.g. using System.Linq)
then you can do
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any());
This will give you groups with all the duplicates
The test for duplicates would then be
var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any()).Any();
or even call ToList() or ToArray() to force the calculation of the result and then you can both check for dupes and examine them.
eg..
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any()).ToArray();
if (dupes.Any()) {
foreach (var dupeList in dupes) {
Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates",
dupList.Key.checkThis,
dupList.Key.checkThat,
dupList.Count() - 1));
}
}
Alternatively
var dupes = dupList.Select((x, i) => new { index = i, value = x})
.GroupBy(x => new {x.value.checkThis, x.value.checkThat})
.Where(x => x.Skip(1).Any());
Which give you the groups which each item per group stores the original index in a property index and the item in the property value

There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:
var hasDuplicatedEntries = ListWithPossibleDuplicates
.GroupBy(YourGroupingExpression)
.Any(e => e.Count() > 1);
if(hasDuplicatedEntries)
{
// Do what ever you want in case when list contains duplicates
}

I like using this for knowing when there are any duplicates at all. Lets say you had a string and wanted to know if there was any duplicate letters. This is what I use.
string text = "this is some text";
var hasDupes = text.GroupBy(x => x).Any(grp => grp.Count() > 1);
If you wanted to know how many duplicates there are no matter what the duplicates are, use this.
var totalDupeItems = text.GroupBy(x => x).Count(grp => grp.Count() > 1);
So for example, "this is some text" has this...
total of letter t: 3
total of letter i: 2
total of letter s: 3
total of letter e: 2
So variable totalDupeItems would equal 4. There are 4 different kinds of duplicates.
If you wanted to get the total amount of dupe items no matter what the dupes are, then use this.
var totalDupes = letters.GroupBy(x => x).Where(grp => grp.Count() > 1).Sum(grp => grp.Count());
So the variable totalDupes would be 10. This is the total duplicate items of each dupe type added together.

I think this is what you're looking for:
List<dupeChecke> duplicates = dupeList.GroupBy(x => x)
.SelectMany(g => g.Skip(1));

For in memory objects I always use the Distinct LINQ method adding a comparer to the solution.
public class dupeCheckee
{
public string checkThis { get; set; }
public string checkThat { get; set; }
dupeCheckee(string val, string val2)
{
checkThis = val;
checkThat = val2;
}
public class Comparer : IEqualityComparer<dupeCheckee>
{
public bool Equals(dupeCheckee x, dupeCheckee y)
{
if (x == null || y == null)
return false;
return x.CheckThis == y.CheckThis && x.CheckThat == y.CheckThat;
}
public int GetHashCode(dupeCheckee obj)
{
if (obj == null)
return 0;
return (obj.CheckThis == null ? 0 : obj.CheckThis.GetHashCode()) ^
(obj.CheckThat == null ? 0 : obj.CheckThat.GetHashCode());
}
}
}
Now we can call
List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe...
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe
var distinct = dupList.Distinct(dupeCheckee.Comparer);

Do a select distinct with linq, e.g. How can I do SELECT UNIQUE with LINQ?
And then compare counts of the distinct results with the non-distinct results. That will give you a boolean saying if the list has doubles.
Also, you could try using a Dictionary, which will guarantee the key is unique.

If any duplicate occurs throws exception. Dictionary checks keys by itself.
this is the easiest way.
try
{
dupList.ToDictionary(a=>new {a.checkThis,a.checkThat});
}
catch{
//message: list items is not uniqe
}

I introduced extension for specific types:
public static class CollectionExtensions
{
public static bool HasDuplicatesByKey<TSource, TKey>(this IEnumerable<TSource> source
, Func<TSource, TKey> keySelector)
{
return source.GroupBy(keySelector).Any(group => group.Skip(1).Any());
}
}
, usage example in code:
if (items.HasDuplicatesByKey(item => item.Id))
{
throw new InvalidOperationException($#"Set {nameof(items)} has duplicates.");
}

Related

How to iterate through GroupBy groups using Dynamic LINQ? [duplicate]

I am using Dynamic Linq helper for grouping data. My code is as follows :
Employee[] empList = new Employee[6];
empList[0] = new Employee() { Name = "CA", State = "A", Department = "xyz" };
empList[1] = new Employee() { Name = "ZP", State = "B", Department = "xyz" };
empList[2] = new Employee() { Name = "AC", State = "B", Department = "xyz" };
empList[3] = new Employee() { Name = "AA", State = "A", Department = "xyz" };
empList[4] = new Employee() { Name = "A2", State = "A", Department = "pqr" };
empList[5] = new Employee() { Name = "BA", State = "B", Department = "pqr" };
var empqueryable = empList.AsQueryable();
var dynamiclinqquery = DynamicQueryable.GroupBy(empqueryable, "new (State, Department)", "it");
How can I get back the Key and corresponding list of grouped items i.e IEnumerable of {Key, List} from dynamiclinqquery ?
I solved the problem by defining a selector that projects the Key as well as Employees List.
var eq = empqueryable.GroupBy("new (State, Department)", "it").Select("new(it.Key as Key, it as Employees)");
var keyEmplist = (from dynamic dat in eq select dat).ToList();
foreach (var group in keyEmplist)
{
var key = group.Key;
var elist = group.Employees;
foreach (var emp in elist)
{
}
}
The GroupBy method should still return something that implements IEnumerable<IGrouping<TKey, TElement>>.
While you might not be able to actually cast it (I'm assuming it's dynamic), you can certainly still make calls on it, like so:
foreach (var group in dynamiclinqquery)
{
// Print out the key.
Console.WriteLine("Key: {0}", group.Key);
// Write the items.
foreach (var item in group)
{
Console.WriteLine("Item: {0}", item);
}
}

list inside another list select

what is the correct syntax to write the second list? bookid and other fields are not recognizing
var bookssublist = from bookdetails in bookslist
join bookcategories in _context.BookCategories
on bookdetails.BookId equals bookcategories.BookId
where bookcategories.CategoryId==CategoryId
select new BookBasicInfo {
count = bookcount,
BookInfo = new List<BookInfo>()
{
BookId = bookdetails.BookId,
BookTitle = bookdetails.Title,
Images = bookdetails.ThumbnailImagePath,
PublishDate = bookdetails.PublishedDate,
AuthorList = bookdetails.BookAuthors.Select(q => q.Author.Author1).ToList(),
CategoryList =bookdetails.BookCategories.Select(q=>q.Category.CategoryName).ToList(),
}
};
You are using collection initializer in a wrong way. Actually, you forgot to pass an object of type BookInfo to the initializer.
BookInfo = new List<BookInfo>()
{
new BookInfo()
{
BookId = bookdetails.BookId,
BookTitle = bookdetails.Title,
Images = bookdetails.ThumbnailImagePath,
PublishDate = bookdetails.PublishedDate,
AuthorList = bookdetails.BookAuthors.Select(q => q.Author.Author1).ToList(),
CategoryList =bookdetails.BookCategories.Select(q=>q.Category.CategoryName).ToList()
}
}

PIVOT with LINQ from Datatable [duplicate]

I have a collection of items that contain an Enum (TypeCode) and a User object, and I need to flatten it out to show in a grid. It's hard to explain, so let me show a quick example.
Collection has items like so:
TypeCode | User
---------------
1 | Don Smith
1 | Mike Jones
1 | James Ray
2 | Tom Rizzo
2 | Alex Homes
3 | Andy Bates
I need the output to be:
1 | 2 | 3
Don Smith | Tom Rizzo | Andy Bates
Mike Jones | Alex Homes |
James Ray | |
I've tried doing this using foreach, but I can't do it that way because I'd be inserting new items to the collection in the foreach, causing an error.
Can this be done in Linq in a cleaner fashion?
I'm not saying it is a great way to pivot - but it is a pivot...
// sample data
var data = new[] {
new { Foo = 1, Bar = "Don Smith"},
new { Foo = 1, Bar = "Mike Jones"},
new { Foo = 1, Bar = "James Ray"},
new { Foo = 2, Bar = "Tom Rizzo"},
new { Foo = 2, Bar = "Alex Homes"},
new { Foo = 3, Bar = "Andy Bates"},
};
// group into columns, and select the rows per column
var grps = from d in data
group d by d.Foo
into grp
select new {
Foo = grp.Key,
Bars = grp.Select(d2 => d2.Bar).ToArray()
};
// find the total number of (data) rows
int rows = grps.Max(grp => grp.Bars.Length);
// output columns
foreach (var grp in grps) {
Console.Write(grp.Foo + "\t");
}
Console.WriteLine();
// output data
for (int i = 0; i < rows; i++) {
foreach (var grp in grps) {
Console.Write((i < grp.Bars.Length ? grp.Bars[i] : null) + "\t");
}
Console.WriteLine();
}
Marc's answer gives sparse matrix that can't be pumped into Grid directly.
I tried to expand the code from the link provided by Vasu as below:
public static Dictionary<TKey1, Dictionary<TKey2, TValue>> Pivot3<TSource, TKey1, TKey2, TValue>(
this IEnumerable<TSource> source
, Func<TSource, TKey1> key1Selector
, Func<TSource, TKey2> key2Selector
, Func<IEnumerable<TSource>, TValue> aggregate)
{
return source.GroupBy(key1Selector).Select(
x => new
{
X = x.Key,
Y = source.GroupBy(key2Selector).Select(
z => new
{
Z = z.Key,
V = aggregate(from item in source
where key1Selector(item).Equals(x.Key)
&& key2Selector(item).Equals(z.Key)
select item
)
}
).ToDictionary(e => e.Z, o => o.V)
}
).ToDictionary(e => e.X, o => o.Y);
}
internal class Employee
{
public string Name { get; set; }
public string Department { get; set; }
public string Function { get; set; }
public decimal Salary { get; set; }
}
public void TestLinqExtenions()
{
var l = new List<Employee>() {
new Employee() { Name = "Fons", Department = "R&D", Function = "Trainer", Salary = 2000 },
new Employee() { Name = "Jim", Department = "R&D", Function = "Trainer", Salary = 3000 },
new Employee() { Name = "Ellen", Department = "Dev", Function = "Developer", Salary = 4000 },
new Employee() { Name = "Mike", Department = "Dev", Function = "Consultant", Salary = 5000 },
new Employee() { Name = "Jack", Department = "R&D", Function = "Developer", Salary = 6000 },
new Employee() { Name = "Demy", Department = "Dev", Function = "Consultant", Salary = 2000 }};
var result5 = l.Pivot3(emp => emp.Department, emp2 => emp2.Function, lst => lst.Sum(emp => emp.Salary));
var result6 = l.Pivot3(emp => emp.Function, emp2 => emp2.Department, lst => lst.Count());
}
* can't say anything about the performance though.
You can use Linq's .ToLookup to group in the manner you are looking for.
var lookup = data.ToLookup(d => d.TypeCode, d => d.User);
Then it's a matter of putting it into a form that your consumer can make sense of. For instance:
//Warning: untested code
var enumerators = lookup.Select(g => g.GetEnumerator()).ToList();
int columns = enumerators.Count;
while(columns > 0)
{
for(int i = 0; i < enumerators.Count; ++i)
{
var enumerator = enumerators[i];
if(enumator == null) continue;
if(!enumerator.MoveNext())
{
--columns;
enumerators[i] = null;
}
}
yield return enumerators.Select(e => (e != null) ? e.Current : null);
}
Put that in an IEnumerable<> method and it will (probably) return a collection (rows) of collections (column) of User where a null is put in a column that has no data.
I guess this is similar to Marc's answer, but I'll post it since I spent some time working on it. The results are separated by " | " as in your example. It also uses the IGrouping<int, string> type returned from the LINQ query when using a group by instead of constructing a new anonymous type. This is tested, working code.
var Items = new[] {
new { TypeCode = 1, UserName = "Don Smith"},
new { TypeCode = 1, UserName = "Mike Jones"},
new { TypeCode = 1, UserName = "James Ray"},
new { TypeCode = 2, UserName = "Tom Rizzo"},
new { TypeCode = 2, UserName = "Alex Homes"},
new { TypeCode = 3, UserName = "Andy Bates"}
};
var Columns = from i in Items
group i.UserName by i.TypeCode;
Dictionary<int, List<string>> Rows = new Dictionary<int, List<string>>();
int RowCount = Columns.Max(g => g.Count());
for (int i = 0; i <= RowCount; i++) // Row 0 is the header row.
{
Rows.Add(i, new List<string>());
}
int RowIndex;
foreach (IGrouping<int, string> c in Columns)
{
Rows[0].Add(c.Key.ToString());
RowIndex = 1;
foreach (string user in c)
{
Rows[RowIndex].Add(user);
RowIndex++;
}
for (int r = RowIndex; r <= Columns.Count(); r++)
{
Rows[r].Add(string.Empty);
}
}
foreach (List<string> row in Rows.Values)
{
Console.WriteLine(row.Aggregate((current, next) => current + " | " + next));
}
Console.ReadLine();
I also tested it with this input:
var Items = new[] {
new { TypeCode = 1, UserName = "Don Smith"},
new { TypeCode = 3, UserName = "Mike Jones"},
new { TypeCode = 3, UserName = "James Ray"},
new { TypeCode = 2, UserName = "Tom Rizzo"},
new { TypeCode = 2, UserName = "Alex Homes"},
new { TypeCode = 3, UserName = "Andy Bates"}
};
Which produced the following results showing that the first column doesn't need to contain the longest list. You could use OrderBy to get the columns ordered by TypeCode if needed.
1 | 3 | 2
Don Smith | Mike Jones | Tom Rizzo
| James Ray | Alex Homes
| Andy Bates |
#Sanjaya.Tio I was intrigued by your answer and created this adaptation which minimizes keySelector execution. (untested)
public static Dictionary<TKey1, Dictionary<TKey2, TValue>> Pivot3<TSource, TKey1, TKey2, TValue>(
this IEnumerable<TSource> source
, Func<TSource, TKey1> key1Selector
, Func<TSource, TKey2> key2Selector
, Func<IEnumerable<TSource>, TValue> aggregate)
{
var lookup = source.ToLookup(x => new {Key1 = key1Selector(x), Key2 = key2Selector(x)});
List<TKey1> key1s = lookup.Select(g => g.Key.Key1).Distinct().ToList();
List<TKey2> key2s = lookup.Select(g => g.Key.Key2).Distinct().ToList();
var resultQuery =
from key1 in key1s
from key2 in key2s
let lookupKey = new {Key1 = key1, Key2 = key2}
let g = lookup[lookupKey]
let resultValue = g.Any() ? aggregate(g) : default(TValue)
select new {Key1 = key1, Key2 = key2, ResultValue = resultValue};
Dictionary<TKey1, Dictionary<TKey2, TValue>> result = new Dictionary<TKey1, Dictionary<TKey2, TValue>>();
foreach(var resultItem in resultQuery)
{
TKey1 key1 = resultItem.Key1;
TKey2 key2 = resultItem.Key2;
TValue resultValue = resultItem.ResultValue;
if (!result.ContainsKey(key1))
{
result[key1] = new Dictionary<TKey2, TValue>();
}
var subDictionary = result[key1];
subDictionary[key2] = resultValue;
}
return result;
}

Compare String with split in contains - LINQ

My requirement is to compare the values in string with the list of string.
Code:
string Names = "Prabha,Karan";
List<string> Presenter = new List<string> { "Prabha", "Joe", "Hukm" };
bool Presented = Presenter.Contains(Names.Split(','));
the above code throws an error and here i need to find the names are presented in the presenter(Presenter has the splited values of the Names).
you could do it like below:
var splitNames = Names.Split(',');
bool Presented = Presenter.Any(p => splitNames.Contains(p));
EDIT:
If you're interested what are the matches just do:
var matches = Presenter.Where(p => splitNames.Contains(p))
string names = "Prabha,Karan";
List<string> presenter = new List<string> { "Prabha", "Joe", "Hukm" };
IEnumerable<string> namesList = names.Split(',').Select(x => x.Trim());
var list = presenter.Intersect(namesList);
bool presented = namesList.Count() == list.Count());
The unit tests to cover your case:
[Test]
public void AllSourceEntriesAreFoundInTheTargetList()
{
string names = "Prabha,Karan";
List<string> presenter = new List<string> { "Prabha", "Joe", "Hukm" };
IEnumerable<string> namesList = names.Split(',').Select(x => x.Trim());
var list = presenter.Intersect(namesList);
Assert.AreNotEqual(namesList.Count(), list.Count());
presenter = new List<string> { "Prabha", "Karan", "SomeAnother" };
var list1 = presenter.Intersect(namesList);
Assert.AreEqual(namesList.Count(), list1.Count());
}

LINQ Union with Constant Values

Very primitive question but I am stuck (I guess being newbie). I have a function which is supposed to send me the list of companies : ALSO, I want the caller to be able to specify a top element for the drop-down list as well.. (say for "None"). I have following piece of code, how I will append the Top Element with the returning SelectList?
public static SelectList GetCompanies( bool onlyApproved, FCCIEntityDataContext entityDataContext, SelectListItem TopElement )
{
var cs = from c in entityDataContext.Corporates
where ( c.Approved == onlyApproved || onlyApproved == false )
select new
{
c.Id,
c.Company
};
return new SelectList( cs.AsEnumerable(), "Id", "Comapny" );
}
Thanks!
This should work for you:
List<Corporate> corporates =
(from c in entityDataContext.Corporates
where (c.Approved == onlyApproved || onlyApproved == false)
select c).ToList();
corporates.Add(new Corporate { Id = -1, Company = "None" });
return new SelectList(corporates.AsEnumerable(), "Id", "Comapny");
This method has always worked for me.
public static SelectList GetCompanies( bool onlyApproved, FCCIEntityDataContext entityDataContext, SelectListItem TopElement )
{
var cs = from c in entityDataContext.Corporates
where ( c.Approved == onlyApproved || onlyApproved == false )
select new SelectListItem {
Value = c.Id,
Text = c.Company
};
var list = cs.ToList();
list.Insert(0, TopElement);
var selectList = new SelectList( list, "Value", "Text" );
selectList.SelectedValue = TopElement.Value;
return selectList;
}
Update forgot the lesson I learned when I did this. You have to output the LINQ as SelectListItem.
cs.ToList().Insert(0, new { TopElement.ID, TopElement.Company });
You could convert it to a list as indicated or you could union the IQueryable result with a constant array of one element (and even sort it):
static void Main(string[] args)
{
var sampleData = new[] {
new { Id = 1, Company = "Acme", Approved = true },
new { Id = 2, Company = "Blah", Approved = true }
};
bool onlyApproved = true;
var cs = from c in sampleData
where (c.Approved == onlyApproved || onlyApproved == false)
select new
{
c.Id,
c.Company
};
cs = cs.Union(new [] {new { Id = -1, Company = "None" }}).OrderBy(c => c.Id);
foreach (var c in cs)
{
Console.WriteLine(String.Format("Id = {0}; Company = {1}", c.Id, c.Company));
}
Console.ReadKey();
}

Resources