C# LINQ find duplicates in List - linq

Using LINQ, from a List<int>, how can I retrieve a list that contains entries repeated more than once and their values?

The easiest way to solve the problem is to group the elements based on their value, and then pick a representative of the group if there are more than one element in the group. In LINQ, this translates to:
var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => y.Key)
.ToList();
If you want to know how many times the elements are repeated, you can use:
var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => new { Element = y.Key, Counter = y.Count() })
.ToList();
This will return a List of an anonymous type, and each element will have the properties Element and Counter, to retrieve the information you need.
And lastly, if it's a dictionary you are looking for, you can use
var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.ToDictionary(x => x.Key, y => y.Count());
This will return a dictionary, with your element as key, and the number of times it's repeated as value.

Find out if an enumerable contains any duplicate :
var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);
Find out if all values in an enumerable are unique :
var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);

Another way is using HashSet:
var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));
If you want unique values in your duplicates list:
var myhash = new HashSet<int>();
var mylist = new List<int>(){1,1,2,2,3,3,3,4,4,4};
var duplicates = mylist.Where(item => !myhash.Add(item)).Distinct().ToList();
Here is the same solution as a generic extension method:
public static class Extensions
{
public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
{
var hash = new HashSet<TKey>(comparer);
return source.Where(item => !hash.Add(selector(item))).ToList();
}
public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
{
return source.GetDuplicates(x => x, comparer);
}
public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
return source.GetDuplicates(selector, null);
}
public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
{
return source.GetDuplicates(x => x, null);
}
}

To find the duplicate values only:
var duplicates = list.GroupBy(x => x.Key).Where(g => g.Count() > 1);
E.g.
var list = new[] {1,2,3,1,4,2};
GroupBy will group the numbers by their keys and will maintain the count (number of times it is repeated) with it. After that, we are just checking the values which have repeated more than once.
To find the unique values only:
var unique = list.GroupBy(x => x.Key).Where(g => g.Count() == 1);
E.g.
var list = new[] {1,2,3,1,4,2};
GroupBy will group the numbers by their keys and will maintain the count (number of times it repeated) with it. After that, we are just checking the values who have repeated only once means are unique.

You can do this:
var list = new[] {1,2,3,1,4,2};
var duplicateItems = list.Duplicates();
With these extension methods:
public static class Extensions
{
public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
var grouped = source.GroupBy(selector);
var moreThan1 = grouped.Where(i => i.IsMultiple());
return moreThan1.SelectMany(i => i);
}
public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
{
return source.Duplicates(i => i);
}
public static bool IsMultiple<T>(this IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
return enumerator.MoveNext() && enumerator.MoveNext();
}
}
Using IsMultiple() in the Duplicates method is faster than Count() because this does not iterate the whole collection.

I created a extention to response to this you could includ it in your projects, I think this return the most case when you search for duplicates in List or Linq.
Example:
//Dummy class to compare in list
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
public string Surname { get; set; }
public Person(int id, string name, string surname)
{
this.Id = id;
this.Name = name;
this.Surname = surname;
}
}
//The extention static class
public static class Extention
{
public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
{ //Return only the second and next reptition
return extList
.GroupBy(groupProps)
.SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
}
public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
{
//Get All the lines that has repeating
return extList
.GroupBy(groupProps)
.Where(z => z.Count() > 1) //Filter only the distinct one
.SelectMany(z => z);//All in where has to be retuned
}
}
//how to use it:
void DuplicateExample()
{
//Populate List
List<Person> PersonsLst = new List<Person>(){
new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
new Person(2,"Ana","Figueiredo"),
new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
new Person(4,"Margarida","Figueiredo"),
new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
};
Console.WriteLine("All:");
PersonsLst.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
All:
1 -> Ricardo Figueiredo
2 -> Ana Figueiredo
3 -> Ricardo Figueiredo
4 -> Margarida Figueiredo
5 -> Ricardo Figueiredo
*/
Console.WriteLine("All lines with repeated data");
PersonsLst.getAllRepeated(z => new { z.Name, z.Surname })
.ToList()
.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
All lines with repeated data
1 -> Ricardo Figueiredo
3 -> Ricardo Figueiredo
5 -> Ricardo Figueiredo
*/
Console.WriteLine("Only Repeated more than once");
PersonsLst.getMoreThanOnceRepeated(z => new { z.Name, z.Surname })
.ToList()
.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
Only Repeated more than once
3 -> Ricardo Figueiredo
5 -> Ricardo Figueiredo
*/
}

there is an answer but i did not understand why is not working;
var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);
my solution is like that in this situation;
var duplicates = model.list
.GroupBy(s => s.SAME_ID)
.Where(g => g.Count() > 1).Count() > 0;
if(duplicates) {
doSomething();
}

Complete set of Linq to SQL extensions of Duplicates functions checked in MS SQL Server. Without using .ToList() or IEnumerable. These queries executing in SQL Server rather than in memory.. The results only return at memory.
public static class Linq2SqlExtensions {
public class CountOfT<T> {
public T Key { get; set; }
public int Count { get; set; }
}
public static IQueryable<TKey> Duplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => s.Key);
public static IQueryable<TSource> GetDuplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).SelectMany(s => s);
public static IQueryable<CountOfT<TKey>> DuplicatesCounts<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(y => new CountOfT<TKey> { Key = y.Key, Count = y.Count() });
public static IQueryable<Tuple<TKey, int>> DuplicatesCountsAsTuble<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => Tuple.Create(s.Key, s.Count()));
}

Linq query:
var query = from s2 in (from s in someList group s by new { s.Column1, s.Column2 } into sg select sg) where s2.Count() > 1 select s2;

This More simple way without use Groups just get the District elements and then iterate over them and check their count in the list if their count is >1 this mean it appear more than 1 item so add it to Repeteditemlist
var mylist = new List<int>() { 1, 1, 2, 3, 3, 3, 4, 4, 4 };
var distList= mylist.Distinct().ToList();
var Repeteditemlist = new List<int>();
foreach (var item in distList)
{
if(mylist.Count(e => e == item) > 1)
{
Repeteditemlist.Add(item);
}
}
foreach (var item in Repeteditemlist)
{
Console.WriteLine(item);
}
Expected OutPut:
1
3
4

Just an another approach:
For just HasDuplicate:
bool hasAnyDuplicate = list.Count > list.Distinct().Count;
For duplicate values
List<string> duplicates = new List<string>();
duplicates.AddRange(list);
list.Distinct().ToList().ForEach(x => duplicates.Remove(x));
// for unique duplicate values:
duplicates.Distinct():

All the GroupBy answers are the simplest but won't be the most efficient. They're especially bad for memory performance as building large inner collections has allocation cost.
A decent alternative is HuBeZa's HashSet.Add based approach. It performs better.
If you don't care about nulls, something like this is the most efficient (both CPU and memory) as far as I can think:
public static IEnumerable<TProperty> Duplicates<TSource, TProperty>(
this IEnumerable<TSource> source,
Func<TSource, TProperty> duplicateSelector,
IEqualityComparer<TProperty> comparer = null)
{
comparer ??= EqualityComparer<TProperty>.Default;
Dictionary<TProperty, int> counts = new Dictionary<TProperty, int>(comparer);
foreach (var item in source)
{
TProperty property = duplicateSelector(item);
counts.TryGetValue(property, out int count);
switch (count)
{
case 0:
counts[property] = ++count;
break;
case 1:
counts[property] = ++count;
yield return property;
break;
}
}
}
The trick here is to avoid additional lookup costs once the duplicate count has reached 1. Of course you could keep updating the dictionary with count if you also want the number of duplicate occurrences for each item. For nulls, you just need some additional handling there, that's all.

Remove duplicates by key
myTupleList = myTupleList.GroupBy(tuple => tuple.Item1).Select(group => group.First()).ToList();

Related

Grouped custom object with a list property

I have a list of customObject, I want to group the "CustomObject" by the List property of the CustomObject object.
public class CustomObject
{
public string Name { get; set; }
public List<string> List { get; set; }
public CustomObject(string name, List<string> list)
{
this.Name = name;
this.List = list;
}
}
.....................
List<CustomObject> listCustomObject = new List<CustomObject>()
{
new CustomObject("A", new List<string>(){ "1","2","3", "4"} ),
new CustomObject("B", new List<string>(){ "4","8","5"}),
new CustomObject("C", new List<string>(){ "5","1","2", "4"})
};
Desired results :
"A"/"C" => identical item in the list ("1", "2")
"A"/"B"/"C" => identical item in the list ("4")
"B"/"C" => identical item in the list ("5")
Using some extension methods, you can generate all combinations of the inputs having at least two members:
public static IEnumerable<IEnumerable<T>> AtLeastCombinations<T>(this IEnumerable<T> elements, int minK) => Enumerable.Range(minK, elements.Count()+1-minK).SelectMany(k => elements.Combinations(k));
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k) {
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] { e }).Concat(c)));
}
Now you can simply test each combination to see if they have any common elements:
var ans = listCustomObject.AtLeastCombinations(2)
.Select(c => new { CombinationNames = c.Select(co => co.Name).ToList(), CombinationIntersect = c.Select(co => co.List).Aggregate((sofar, coList) => sofar.Intersect(coList).ToList()) })
.Where(ci => ci.CombinationIntersect.Count > 0)
.ToList();

Lambdas in Linq AST - why different behaviour?

Let's assume the following demo classes.
public class Foo {
public int key1 {get; set;}
public Foo(int _key1) {
key1 = _key1;
}
}
public class Bar {
public int key2 {get; set;}
public Bar(int _key2) {
key2 = _key2;
}
}
They are combined together in a simple Linq join.
Foo[]aSet = new Foo[3]{new Foo(1),new Foo(2),new Foo(3)};
Bar[]bSet = new Bar[3]{new Bar(1),new Bar(3),new Bar(5)};
Func<int,Func<Foo,bool>> VisibleLambda = w => x => x.key1 > w;
var pb = Expression.Parameter(typeof(Bar),"z");
var pf = Expression.Parameter(typeof(Foo), "y");
PropertyInfo BarId = typeof(Bar).GetProperty("key2");
PropertyInfo FooId = typeof(Foo).GetProperty("key1");
var eqexpr = Expression.Equal(Expression.Property(pb, BarId), Expression.Property(pf, FooId));
var lambdaInt = Expression.Lambda<Func<Bar, bool>>(eqexpr, pb);
var InvisibleLambda = Expression.Lambda<Func<Foo,Func<Bar, bool>>>( lambdaInt,pf);
var query = from a in aSet.Where(VisibleLambda(1))
from b in bSet.Where(InvisibleLambda.Compile()(a))
select new Tuple<Foo,Bar>(a,b);
Now, the query is implemented through an extension
IQueryable<TElement> IQueryProvider.CreateQuery<TElement>(Expression expression)
{
if (expression == null)
throw new ArgumentNullException("expression");
return new ExpressionQueryImpl<TElement>(DataContextInfo, expression);
}
The details of the implementation are irrelevant: my question is only related to the expression derived from the IQueryable.
There are two lambdas: one ("visible") is generated as an argument of the expression with a NodeType Quote that is very easy to analyse, while the other one ("invisible") is generated as a second argument of the expression with "where" clause of NodeType Invoke that is almost invisible in terms of its sql rendering.
Why is that happening and is there a way to work-around and d-tour it?
As pointed out by the comments 1 and 2 of Ivan Stoev, the different behaviour, and in particular the problem in the sql generation, was due to different signature expected from the Queryable.Where
Here is the solution from Igor Tkachev, for anyone who would be interested.
Everything boils down to implementing the helpful extension, where one can leverage the linq method with the appropriate signature: i.e the Queryable.GroupJoin :-)
static class ExpressionTestExtensions
{
public class LeftJoinInfo<TOuter,TInner>
{
public TOuter Outer;
public TInner Inner;
}
[ExpressionMethod("LeftJoinImpl")]
public static IQueryable<LeftJoinInfo<TOuter,TInner>> LeftJoin<TOuter, TInner, TKey>(
this IQueryable<TOuter> outer,
IEnumerable<TInner> inner,
Expression<Func<TOuter, TKey>> outerKeySelector,
Expression<Func<TInner, TKey>> innerKeySelector)
{
return outer
.GroupJoin(inner, outerKeySelector, innerKeySelector, (o, gr) => new { o, gr })
.SelectMany(t => t.gr.DefaultIfEmpty(), (o,i) => new LeftJoinInfo<TOuter,TInner> { Outer = o.o, Inner = i });
}
static Expression<Func<
IQueryable<TOuter>,
IEnumerable<TInner>,
Expression<Func<TOuter,TKey>>,
Expression<Func<TInner,TKey>>,
IQueryable<LeftJoinInfo<TOuter,TInner>>>>
LeftJoinImpl<TOuter, TInner, TKey>()
{
return (outer,inner,outerKeySelector,innerKeySelector) => outer
.GroupJoin(inner, outerKeySelector, innerKeySelector, (o, gr) => new { o, gr })
.SelectMany(t => t.gr.DefaultIfEmpty(), (o,i) => new LeftJoinInfo<TOuter,TInner> { Outer = o.o, Inner = i });
}
}
Having defined such an extension, my "generic join" will turn to
static internal IQueryable<ExpressionTestExtensions.LeftJoinInfo<T2,T1>> NewJoin<T1, T2, TKey>(Expression<Func<T2, TKey>> outer, Expression<Func<T1, TKey>> inner)
where T2: class
where T1 : class
{
using (var db = new MyContext()) {
var query = (from b in db.GetTable<T2>() select b).LeftJoin <T2,T1, TKey>((from f in db.GetTable<T1>() select f), outer, inner);
return query;
}
}
}
Finally, the elegant use case simply becomes
public static void Main(string[] args)
{
Console.WriteLine("Hello World!");
//var queryList = Test.Join<Bar, Foo>(b => q => q.id == b.id);
var queryList = Test.NewJoin<Bar, Foo, int>(q => q.id, b => b.id);
foreach (var telement in queryList)
{
var bar = telement.Inner as Bar;
var element = telement.Outer as Foo;
Console.WriteLine(element.id.ToString() + " " + element.FromDate.ToShortDateString() +" "
+bar.id.ToString() + " " + bar.Name
);
}
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}

How can I transform this linq expression?

Say I have an entity that I want to query with ranking applied:
public class Person: Entity
{
public int Id { get; protected set; }
public string Name { get; set; }
public DateTime Birthday { get; set; }
}
In my query I have the following:
Expression<Func<Person, object>> orderBy = x => x.Name;
var dbContext = new MyDbContext();
var keyword = "term";
var startsWithResults = dbContext.People
.Where(x => x.Name.StartsWith(keyword))
.Select(x => new {
Rank = 1,
Entity = x,
});
var containsResults = dbContext.People
.Where(x => !startsWithResults.Select(y => y.Entity.Id).Contains(x.Id))
.Where(x => x.Name.Contains(keyword))
.Select(x => new {
Rank = 2,
Entity = x,
});
var rankedResults = startsWithResults.Concat(containsResults)
.OrderBy(x => x.Rank);
// TODO: apply thenby ordering here based on the orderBy expression above
dbContext.Dispose();
I have tried ordering the results before selecting the anonymous object with the Rank property, but the ordering ends up getting lost. It seems that linq to entities discards the ordering of the separate sets and converts back to natural ordering during both Concat and Union.
What I think I may be able to do is dynamically transform the expression defined in the orderBy variable from x => x.Name to x => x.Entity.Name, but I'm not sure how:
if (orderBy != null)
{
var transformedExpression = ???
rankedResults = rankedResults.ThenBy(transformedExpression);
}
How might I be able to use Expression.Lambda to wrap x => x.Name into x => x.Entity.Name? When I hard code x => x.Entity.Name into the ThenBy I get the ordering that I want, but the orderBy is provided by the calling class of the query, so I don't want to hard-code it in. I have it hardcoded in the example above for simplicity of explanation only.
This should help. However you are going to have to concrete up the Anonymous type for this to work. My LinqPropertyChain will not work with it, since its going to be difficult to create the Expression<Func<Anonymous, Person>> whilst its still Anonymous.
Expression<Func<Person, object>> orderBy = x => x.Name;
using(var dbContext = new MyDbContext())
{
var keyword = "term";
var startsWithResults = dbContext.People
.Where(x => x.Name.StartsWith(keyword))
.Select(x => new {
Rank = 1,
Entity = x,
});
var containsResults = dbContext.People
.Where(x => !startsWithResults.Select(y => y.Entity.Id).Contains(x.Id))
.Where(x => x.Name.Contains(keyword))
.Select(x => new {
Rank = 2,
Entity = x,
});
var rankedResults = startsWithResults.Concat(containsResults)
.OrderBy(x => x.Rank)
.ThenBy(LinqPropertyChain.Chain(x => x.Entity, orderBy));
// TODO: apply thenby ordering here based on the orderBy expression above
}
public static class LinqPropertyChain
{
public static Expression<Func<TInput, TOutput>> Chain<TInput, TOutput, TIntermediate>(
Expression<Func<TInput, TIntermediate>> outter,
Expression<Func<TIntermediate, TOutput>> inner
)
{
Console.WriteLine(inner);
Console.WriteLine(outter);
var visitor = new Visitor(new Dictionary<ParameterExpression, Expression>
{
{inner.Parameters[0], outter.Body}
});
var newBody = visitor.Visit(inner.Body);
Console.WriteLine(newBody);
return Expression.Lambda<Func<TInput, TOutput>>(newBody, outter.Parameters);
}
private class Visitor : ExpressionVisitor
{
private readonly Dictionary<ParameterExpression, Expression> _replacement;
public Visitor(Dictionary<ParameterExpression, Expression> replacement)
{
_replacement = replacement;
}
protected override Expression VisitParameter(ParameterExpression node)
{
if (_replacement.ContainsKey(node))
return _replacement[node];
else
{
return node;
}
}
}
}
Figured out a way to do this with less Explicite Generics.
Expression<Func<Person, object>> orderBy = x => x.Name;
Expression<Func<Foo, Person>> personExpression = x => x.Person;
var helper = new ExpressionChain(personExpression);
var chained = helper.Chain(orderBy).Expression;
// Define other methods and classes here
public class ExpressionChain<TInput, TOutput>
{
private readonly Expression<Func<TInput, TOutput>> _expression;
public ExpressionChain(Expression<Func<TInput, TOutput>> expression)
{
_expression = expression;
}
public Expression<Func<TInput, TOutput>> Expression { get { return _expression; } }
public ExpressionChain<TInput, TChained> Chain<TChained>
(Expression<Func<TOutput, TChained>> chainedExpression)
{
var visitor = new Visitor(new Dictionary<ParameterExpression, Expression>
{
{_expression.Parameters[0], chainedExpression.Body}
});
var lambda = Expression.Lambda<Func<TInput, TOutput>>(newBody, outter.Parameters);
return new ExpressionChain(lambda);
}
private class Visitor : ExpressionVisitor
{
private readonly Dictionary<ParameterExpression, Expression> _replacement;
public Visitor(Dictionary<ParameterExpression, Expression> replacement)
{
_replacement = replacement;
}
protected override Expression VisitParameter(ParameterExpression node)
{
if (_replacement.ContainsKey(node))
return _replacement[node];
else
{
return node;
}
}
}
}
Since you're ordering by Rank first, and the Rank values are identical within each sequence, you should be able to just sort independently and then concatenate. It sounds like the hiccup here would be that, according to your post, Entity Framework isn't maintaining sorting across Concat or Union operations. You should be able to get around this by forcing the concatenation to happen client-side:
var rankedResults = startsWithResults.OrderBy(orderBy)
.AsEnumerable()
.Concat(containsResults.OrderBy(orderBy));
This should render the Rank property unnecessary and probably simplify the SQL queries being executed against your database, and it doesn't require mucking about with expression trees.
The downside is that, once you call AsEnumerable(), you no longer have the option of appending additional database-side operations (i.e., if you chain additional LINQ operators after Concat, they will use the LINQ-to-collections implementations). Looking at your code, I don't think this would be a problem for you, but it's worth mentioning.

how to practically assign repeating objects from groups

I am having a difficult time finding a proper Linq query to utilize the group output.
I want to populate an existing students List where Student class has 2 properties ID and and int[] Repeats array (can be a list too) to keep how many times they took any of the 4 lectures (L101,L201,L202,L203). So if student takes L101 twice, L202 and L203 once, and but didn't take L201 this should be {2,0,1,1,}
class Student{
public string ID{get;set;}
public int[] Repeats{get;set;} //int[0]->L101, int[1]->L201...
}
In my main class I do this basic operation for this task:
foreach (var student in students)
{
var countL101 = from s in rawData
where student.Id==s.Id & s.Lecture =="L101"
select; //do for each lecture
student.Repeats = new int[4];
student.Repeats[0] = countL101.Count(); //do for each lecture
}
This works; but I wonder how do you make it practically using Linq in case where there are 100s of lectures?
I am using Lamba Expressions rather than query syntax. Then assuming rawData is IEnumerable<T> where T looks something like...
class DataRow
{
/// <summary>
/// Id of Student taking lecture
/// </summary>
public string Id { get; set; }
public string Lecture { get; set;}
}
Then you could do something like...
var lectures = rawData.Select(x => x.Lecture).Distinct().ToList();
int i = 0;
lectures.ForEach(l =>
{
students.ForEach(s =>
{
if (s.Repeats == null)
s.Repeats = new int[lectures.Count];
s.Repeats[i] = rawData.Count(x => x.Id == s.Id && x.Lecture == l);
});
i++;
});
Now if Repeats could just be of type IList<int> instead of int[] then...
var lectures = rawData.Select(x => x.Lecture).Distinct().ToList();
lectures.ForEach(l =>
{
students.ForEach(s =>
{
if (s.Repeats == null)
s.Repeats = new List<int>();
s.Repeats.Add(rawData.Count(x => x.Id == s.Id && x.Lecture == l));
});
});
Things are further simplified if Repeats could just be instantiated to a new List<int> in the Student constructor...
class Student
{
public Student()
{
Repeats = new List<int>();
}
public string Id { get; set; }
public IList<int> Repeats { get; private set; }
}
Then you can do it in one line...
rawData.Select(x => x.Lecture).Distinct().ToList()
.ForEach(l =>
{
students.ForEach(s =>
{
s.Repeats.Add(rawData.Count(x => x.Id == s.Id && x.Lecture == l));
});
});

Calculate difference from previous item with LINQ

I'm trying to prepare data for a graph using LINQ.
The problem that i cant solve is how to calculate the "difference to previous.
the result I expect is
ID= 1, Date= Now, DiffToPrev= 0;
ID= 1, Date= Now+1, DiffToPrev= 3;
ID= 1, Date= Now+2, DiffToPrev= 7;
ID= 1, Date= Now+3, DiffToPrev= -6;
etc...
Can You help me create such a query ?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
public class MyObject
{
public int ID { get; set; }
public DateTime Date { get; set; }
public int Value { get; set; }
}
class Program
{
static void Main()
{
var list = new List<MyObject>
{
new MyObject {ID= 1,Date = DateTime.Now,Value = 5},
new MyObject {ID= 1,Date = DateTime.Now.AddDays(1),Value = 8},
new MyObject {ID= 1,Date = DateTime.Now.AddDays(2),Value = 15},
new MyObject {ID= 1,Date = DateTime.Now.AddDays(3),Value = 9},
new MyObject {ID= 1,Date = DateTime.Now.AddDays(4),Value = 12},
new MyObject {ID= 1,Date = DateTime.Now.AddDays(5),Value = 25},
new MyObject {ID= 2,Date = DateTime.Now,Value = 10},
new MyObject {ID= 2,Date = DateTime.Now.AddDays(1),Value = 7},
new MyObject {ID= 2,Date = DateTime.Now.AddDays(2),Value = 19},
new MyObject {ID= 2,Date = DateTime.Now.AddDays(3),Value = 12},
new MyObject {ID= 2,Date = DateTime.Now.AddDays(4),Value = 15},
new MyObject {ID= 2,Date = DateTime.Now.AddDays(5),Value = 18}
};
Console.WriteLine(list);
Console.ReadLine();
}
}
}
One option (for LINQ to Objects) would be to create your own LINQ operator:
// I don't like this name :(
public static IEnumerable<TResult> SelectWithPrevious<TSource, TResult>
(this IEnumerable<TSource> source,
Func<TSource, TSource, TResult> projection)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
TSource previous = iterator.Current;
while (iterator.MoveNext())
{
yield return projection(previous, iterator.Current);
previous = iterator.Current;
}
}
}
This enables you to perform your projection using only a single pass of the source sequence, which is always a bonus (imagine running it over a large log file).
Note that it will project a sequence of length n into a sequence of length n-1 - you may want to prepend a "dummy" first element, for example. (Or change the method to include one.)
Here's an example of how you'd use it:
var query = list.SelectWithPrevious((prev, cur) =>
new { ID = cur.ID, Date = cur.Date, DateDiff = (cur.Date - prev.Date).Days) });
Note that this will include the final result of one ID with the first result of the next ID... you may wish to group your sequence by ID first.
Use index to get previous object:
var LinqList = list.Select(
(myObject, index) =>
new {
ID = myObject.ID,
Date = myObject.Date,
Value = myObject.Value,
DiffToPrev = (index > 0 ? myObject.Value - list[index - 1].Value : 0)
}
);
In C#4 you can use the Zip method in order to process two items at a time. Like this:
var list1 = list.Take(list.Count() - 1);
var list2 = list.Skip(1);
var diff = list1.Zip(list2, (item1, item2) => ...);
Modification of Jon Skeet's answer to not skip the first item:
public static IEnumerable<TResult> SelectWithPrev<TSource, TResult>
(this IEnumerable<TSource> source,
Func<TSource, TSource, bool, TResult> projection)
{
using (var iterator = source.GetEnumerator())
{
var isfirst = true;
var previous = default(TSource);
while (iterator.MoveNext())
{
yield return projection(iterator.Current, previous, isfirst);
isfirst = false;
previous = iterator.Current;
}
}
}
A few key differences... passes a third bool parameter to indicate if it is the first element of the enumerable. I also switched the order of the current/previous parameters.
Here's the matching example:
var query = list.SelectWithPrevious((cur, prev, isfirst) =>
new {
ID = cur.ID,
Date = cur.Date,
DateDiff = (isfirst ? cur.Date : cur.Date - prev.Date).Days);
});
Further to Felix Ungman's post above, below is an example of how you can achieve the data you need making use of Zip():
var diffs = list.Skip(1).Zip(list,
(curr, prev) => new { CurrentID = curr.ID, PreviousID = prev.ID, CurrDate = curr.Date, PrevDate = prev.Date, DiffToPrev = curr.Date.Day - prev.Date.Day })
.ToList();
diffs.ForEach(fe => Console.WriteLine(string.Format("Current ID: {0}, Previous ID: {1} Current Date: {2}, Previous Date: {3} Diff: {4}",
fe.CurrentID, fe.PreviousID, fe.CurrDate, fe.PrevDate, fe.DiffToPrev)));
Basically, you are zipping two versions of the same list but the first version (the current list) begins at the 2nd element in the collection, otherwise a difference would always differ the same element, giving a difference of zero.
I hope this makes sense,
Dave
Yet another mod on Jon Skeet's version (thanks for your solution +1). Except this is returning an enumerable of tuples.
public static IEnumerable<Tuple<T, T>> Intermediate<T>(this IEnumerable<T> source)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
T previous = iterator.Current;
while (iterator.MoveNext())
{
yield return new Tuple<T, T>(previous, iterator.Current);
previous = iterator.Current;
}
}
}
This is NOT returning the first because it's about returning the intermediate between items.
use it like:
public class MyObject
{
public int ID { get; set; }
public DateTime Date { get; set; }
public int Value { get; set; }
}
var myObjectList = new List<MyObject>();
// don't forget to order on `Date`
foreach(var deltaItem in myObjectList.Intermediate())
{
var delta = deltaItem.Second.Offset - deltaItem.First.Offset;
// ..
}
OR
var newList = myObjectList.Intermediate().Select(item => item.Second.Date - item.First.Date);
OR (like jon shows)
var newList = myObjectList.Intermediate().Select(item => new
{
ID = item.Second.ID,
Date = item.Second.Date,
DateDiff = (item.Second.Date - item.First.Date).Days
});
Here is the refactored code with C# 7.2 using the readonly struct and the ValueTuple (also struct).
I use Zip() to create (CurrentID, PreviousID, CurrDate, PrevDate, DiffToPrev) tuple of 5 members. It is easily iterated with foreach:
foreach(var (CurrentID, PreviousID, CurrDate, PrevDate, DiffToPrev) in diffs)
The full code:
public readonly struct S
{
public int ID { get; }
public DateTime Date { get; }
public int Value { get; }
public S(S other) => this = other;
public S(int id, DateTime date, int value)
{
ID = id;
Date = date;
Value = value;
}
public static void DumpDiffs(IEnumerable<S> list)
{
// Zip (or compare) list with offset 1 - Skip(1) - vs the original list
// this way the items compared are i[j+1] vs i[j]
// Note: the resulting enumeration will include list.Count-1 items
var diffs = list.Skip(1)
.Zip(list, (curr, prev) =>
(CurrentID: curr.ID, PreviousID: prev.ID,
CurrDate: curr.Date, PrevDate: prev.Date,
DiffToPrev: curr.Date.Day - prev.Date.Day));
foreach(var (CurrentID, PreviousID, CurrDate, PrevDate, DiffToPrev) in diffs)
Console.WriteLine($"Current ID: {CurrentID}, Previous ID: {PreviousID} " +
$"Current Date: {CurrDate}, Previous Date: {PrevDate} " +
$"Diff: {DiffToPrev}");
}
}
Unit test output:
// the list:
// ID Date
// ---------------
// 233 17-Feb-19
// 122 31-Mar-19
// 412 03-Mar-19
// 340 05-May-19
// 920 15-May-19
// CurrentID PreviousID CurrentDate PreviousDate Diff (days)
// ---------------------------------------------------------
// 122 233 31-Mar-19 17-Feb-19 14
// 412 122 03-Mar-19 31-Mar-19 -28
// 340 412 05-May-19 03-Mar-19 2
// 920 340 15-May-19 05-May-19 10
Note: the struct (especially readonly) performance is much better than that of a class.
Thanks #FelixUngman and #DavidHuxtable for their Zip() ideas!

Resources