How do I transfer this logic into a LINQ statement? - linq

I can't get this bit of logic converted into a Linq statement and it is driving me nuts. I have a list of items that have a category and a createdondate field. I want to group by the category and only return items that have the max date for their category.
So for example, the list contains items with categories 1 and 2. The first day (1/1) I post two items to both categories 1 and 2. The second day (1/2) I post three items to category 1. The list should return the second day postings to category 1 and the first day postings to category 2.
Right now I have it grouping by the category then running through a foreach loop to compare each item in the group with the max date of the group, if the date is less than the max date it removes the item.
There's got to be a way to take the loop out, but I haven't figured it out!

You can do something like that :
from item in list
group item by item.Category into g
select g.OrderByDescending(it => it.CreationDate).First();
However, it's not very efficient, because it needs to sort the items of each group, which is more complex than necessary (you don't actually need to sort, you just need to scan the list once). So I created this extension method to find the item with the max value of a property (or function) :
public static T WithMax<T, TValue>(this IEnumerable<T> source, Func<T, TValue> selector)
{
var max = default(TValue);
var withMax = default(T);
var comparer = Comparer<TValue>.Default;
bool first = true;
foreach (var item in source)
{
var value = selector(item);
int compare = comparer.Compare(value, max);
if (compare > 0 || first)
{
max = value;
withMax = item;
}
first = false;
}
return withMax;
}
You can use it as follows :
from item in list
group item by item.Category into g
select g.WithMax(it => it.CreationDate);
UPDATE : As Anthony noted in his comment, this code doesn't exactly answer the question... if you want all items which date is the maximum of their category, you can do something like that :
from item in list
group item by item.Category into g
let maxDate = g.Max(it => it.CreationDate)
select new
{
Category = g.Key,
Items = g.Where(it => it.CreationDate == maxDate)
};

How about this:
private class Test
{
public string Category { get; set; }
public DateTime PostDate { get; set; }
public string Post { get; set; }
}
private void Form1_Load(object sender, EventArgs e)
{
List<Test> test = new List<Test>();
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 5, 12, 0, 0), Post = "A1" });
test.Add(new Test() { Category = "B", PostDate = new DateTime(2010, 5, 5, 13, 0, 0), Post = "B1" });
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 6, 12, 0, 0), Post = "A2" });
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 6, 13, 0, 0), Post = "A3" });
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 6, 14, 0, 0), Post = "A4" });
var q = test.GroupBy(t => t.Category).Select(g => new { grp = g, max = g.Max(t2 => t2.PostDate).Date }).SelectMany(x => x.grp.Where(t => t.PostDate >= x.max));
}

Reformatting luc's excellent answer to query comprehension form. I like this better for this kind of query because the scoping rules let me write more concisely.
from item in source
group item by item.Category into g
let max = g.Max(item2 => item2.PostDate).Date
from item3 in g
where item3.PostDate.Date == max
select item3;

Related

LINQ DistinctBy chosing what object to keep

If I have a list of objects and I don't want to allow duplicates of a certain attribute of the objects. My understanding is that I can use DistinctBy() to remove one of the objects. My question is, how do I choose which of the objects with the same value of an attribute value do I keep?
Example:
How would I go about removing any objects with a duplicate value of "year" in the list tm and keep the object with the highest value of someValue?
class TestModel{
public int year{ get; set; }
public int someValue { get; set; }
}
List<TestModel> tm = new List<TestModel>();
//populate list
//I was thinking something like this
tm.DistinctBy(x => x.year).Select(x => max(X=>someValue))
You can use GroupBy and Aggregate (there is no MaxBy built-in method in LINQ):
tm
.GroupBy(tm => tm.year)
.Select(g => g.Aggregate((acc, next) => acc.someValue > next.someValue ? acc : next))
User the GroupBy followed by the SelectMany/Take(1) pattern with an OrderBy:
IEnumerable<TestModel> result =
tm
.GroupBy(x => x.year)
.SelectMany(xs =>
xs
.OrderByDescending(x => x.someValue)
.Take(1));
Here's an example:
List<TestModel> tm = new List<TestModel>()
{
new TestModel() { year = 2020, someValue = 5 },
new TestModel() { year = 2020, someValue = 15 },
new TestModel() { year = 2019, someValue = 6 },
};
That gives me:

Comparing two lists with multiple conditions

I have two different lists of same type. I wanted to compare both lists and need to get the values which are not matched.
List of class:
public class pre
{
public int id {get; set;}
public datetime date {get; set;}
public int sID {get; set;}
}
Two lists :
List<pre> pre1 = new List<pre>();
List<pre> pre2 = new List<pre>();
Query which I wrote to get the unmatched values:
var preResult = pre1.where(p1 => !pre
.any(p2 => p2.id == p1.id && p2.date == p1.date && p2.sID == p1sID));
But the result is wrong here. I am getting all the values in pre1.
Here is solution :
class Program
{
static void Main(string[] args)
{
var pre1 = new List<pre>()
{
new pre {id = 1, date =DateTime.Now.Date, sID=1 },
new pre {id = 7, date = DateTime.Now.Date, sID = 2 },
new pre {id = 9, date = DateTime.Now.Date, sID = 3 },
new pre {id = 13, date = DateTime.Now.Date, sID = 4 },
// ... etc ...
};
var pre2 = new List<pre>()
{
new pre {id = 1, date =DateTime.Now.Date, sID=1 },
// ... etc ...
};
var preResult = pre1.Where(p1 => !pre2.Any(p2 => p2.id == p1.id && p2.date == p1.date && p2.sID == p1.sID)).ToList();
Console.ReadKey();
}
}
Note:Property date contain the date and the time part will be 00:00:00.
I fixed some typos and tested your code with sensible values, and your code would correctly select unmatched records. As prabhakaran S's answer mentions, perhaps your date values include time components that differ. You will need to check your data and decide how to proceed.
However, a better way to select unmatched records from one list compared against another would be to utilize a left join technique common to working with relational databases, which you can also do in Linq against in-memory collections. It will scale better as the sizes of your inputs grow.
var preResult = from p1 in pre1
join p2 in pre2
on new { p1.id, p1.date, p1.sID }
equals new { p2.id, p2.date, p2.sID } into grp
from item in grp.DefaultIfEmpty()
where item == null
select p1;

Find / Count Redundant Records in a List<T>

I am looking for a way to identify duplicate records...only I want / expect to see them.
So the records aren't duplicated completely but the unique fields I am unconcerned with at this point. I just want to see if they have made X# payments of the exact same amount, via the exact same card, to the exact same person. (Bogus example just to illustrate)
The collection is a List<> further whatever X# is the List<>.Count will be X#. In other words all the records in the list match (again just the fields I am concerned with) or I will reject it.
The best I can come up with is to take the first record get value of say PayAmount and LINQ the other two to see if they have the same PayAmount value. Repeat for all fields to be matched. This seems horribly inefficient but I am not smart enough to think of a better way.
So any thoughts, ideas, pointers would be greatly appreciated.
JB
Something like this should do it.
var duplicates = list.GroupBy(x => new { x.Amount, x.CardNumber, x.PersonName })
.Where(x => x.Count() > 1);
Working example:
class Program
{
static void Main(string[] args)
{
List<Entry> table = new List<Entry>();
var dup1 = new Entry
{
Name = "David",
CardNumber = 123456789,
PaymentAmount = 70.00M
};
var dup2 = new Entry
{
Name = "Daniel",
CardNumber = 987654321,
PaymentAmount = 45.00M
};
//3 duplicates
table.Add(dup1);
table.Add(dup1);
table.Add(dup1);
//2 duplicates
table.Add(dup2);
table.Add(dup2);
//Find duplicates query
var query = from p in table
group p by new { p.Name, p.CardNumber, p.PaymentAmount } into g
where g.Count() > 1
select new
{
name = g.Key.Name,
cardNumber = g.Key.CardNumber,
amount = g.Key.PaymentAmount,
count = g.Count()
};
foreach (var item in query)
{
Console.WriteLine("{0}, {1}, {2}, {3}", item.name, item.cardNumber, item.amount, item.count);
}
Console.ReadKey();
}
}
public class Entry
{
public string Name { get; set; }
public int CardNumber { get; set; }
public decimal PaymentAmount { get; set; }
}
The meat of which is this:
var query = from p in table
group p by new { p.Name, p.CardNumber, p.PaymentAmount } into g
where g.Count() > 1
select new
{
name = g.Key.Name,
cardNumber = g.Key.CardNumber,
amount = g.Key.PaymentAmount,
count = g.Count()
};
You're unique entries are based off of the 3 criteria of Name, Card Number, and Payment Amount so you group by them and then use .Count() to count how many of those unique values exist. where g.Count() > 1 filters the group to duplicates only.

How to get the Max() of a Count() with LINQ

I'm new to LINQ and I have this situation. I have this table:
ID Date Range
1 10/10/10 9-10
2 10/10/10 9-10
3 10/10/10 9-10
4 10/10/10 8-9
5 10/11/10 1-2
6 10/11/10 1-2
7 10/12/10 5-6
I just want to list the Maximun of rows per date by range, like this:
Date Range Total
10/10/10 9-10 3
10/11/10 1-2 2
10/12/10 5-6 1
I want to do this by using LINQ, do you have any ideas of how to do this?
I think something along these lines should work:
List<MyTable> items = GetItems();
var orderedByMax = from i in items
group i by i.Date into g
let q = g.GroupBy(i => i.Range)
.Select(g2 => new {Range = g2.Key, Count = g2.Count()})
.OrderByDescending(i => i.Count)
let max = q.FirstOrDefault()
select new {
Date = g.Key,
Range = max.Range,
Total = max.Count
};
Using extension methods:
List<MyTable> items = GetItems();
var rangeTotals = items.GroupBy(x => new { x.Date, x.Range }) // Group by Date + Range
.Select(g => new {
Date = g.Key.Date,
Range = g.Key.Range,
Total = g.Count() // Count total of identical ranges per date
});
var rangeMaxTotals = rangeTotals.Where(rt => !rangeTotals.Any(z => z.Date == rt.Date && z.Total > rt.Total)); // Get maximum totals for each date
unfortunately I can't test this at the moment but give this a try:
List<MyTable> items = GetItems();
items.Max(t=>t.Range.Distinct().Count());
This approach:
1) Groups by Date
2) For each Date, groups by Range and calculates the Total
3) For each Date, selects the item with the greatest Total
4) You end up with your result
public sealed class Program
{
public static void Main(string[] args)
{
var items = new[]
{
new { ID = 1, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 2, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 3, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 4, Date = new DateTime(10, 10, 10), Range = "8-9" },
new { ID = 5, Date = new DateTime(10, 10, 11), Range = "1-2" },
new { ID = 6, Date = new DateTime(10, 10, 11), Range = "1-2" },
new { ID = 7, Date = new DateTime(10, 10, 12), Range = "5-6" },
};
var itemsWithTotals = items
.GroupBy(item => item.Date) // Group by Date.
.Select(groupByDate => groupByDate
.GroupBy(item => item.Range) // Group by Range.
.Select(groupByRange => new
{
Date = groupByDate.Key,
Range = groupByRange.Key,
Total = groupByRange.Count()
}) // Got the totals for each grouping.
.MaxElement(item => item.Total)); // For each Date, grab the item (grouped by Range) with the greatest Total.
foreach (var item in itemsWithTotals)
Console.WriteLine("{0} {1} {2}", item.Date.ToShortDateString(), item.Range, item.Total);
Console.Read();
}
}
/// <summary>
/// From the book LINQ in Action, Listing 5.35.
/// </summary>
static class ExtensionMethods
{
public static TElement MaxElement<TElement, TData>(this IEnumerable<TElement> source, Func<TElement, TData> selector) where TData : IComparable<TData>
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
bool firstElement = true;
TElement result = default(TElement);
TData maxValue = default(TData);
foreach (TElement element in source)
{
var candidate = selector(element);
if (firstElement || (candidate.CompareTo(maxValue) > 0))
{
firstElement = false;
maxValue = candidate;
result = element;
}
}
return result;
}
}
According to LINQ in Action (Chapter 5.3.3 - Will LINQ to Objects hurt the performance of my code?), using the MaxElement extension method is one of the most effecient approaches. I think the performance would be O(4n); one for the first GroupBy, two for the second GroupBy, three for the Count(), and four for loop within MaxElement.
DrDro's approach is going to be more like O(n^2) since it loops the entire list for each item in the list.
StriplingWarrior's approach is going to be closer to O(n log n) because it sorts the items. Though I'll admit, there may be some crazy magic in there that I don't understand.

Linq - 'Saving' OrderBy operation (c#)

Assume I have generic list L of some type in c#. Then, using linq, call OrderBy() on it, passing in a lambda expression.
If I then re-assign the L, the previous order operation will obviously be lost.
Is there any way I can 'save' the lambda expression I used on the list before i reassigned it, and re-apply it?
Use a Func delegate to store your ordering then pass that to the OrderBy method:
Func<int, int> orderFunc = i => i; // func for ordering
var list = Enumerable.Range(1,10).OrderByDescending(i => i); // 10, 9 ... 1
var newList = list.OrderBy(orderFunc); // 1, 2 ... 10
As another example consider a Person class:
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
}
Now you want to preserve a sort order that sorts by the Name property. In this case the Func operates on a Person type (T) and the TResult will be a string since Name is a string and is what you are sorting by.
Func<Person, string> nameOrder = p => p.Name;
var list = new List<Person>
{
new Person { Id = 1, Name = "ABC" },
new Person { Id = 2, Name = "DEF" },
new Person { Id = 3, Name = "GHI" },
};
// descending order by name
foreach (var p in list.OrderByDescending(nameOrder))
Console.WriteLine(p.Id + ":" + p.Name);
// 3:GHI
// 2:DEF
// 1:ABC
// re-assinging the list
list = new List<Person>
{
new Person { Id = 23, Name = "Foo" },
new Person { Id = 14, Name = "Buzz" },
new Person { Id = 50, Name = "Bar" },
};
// reusing the order function (ascending by name in this case)
foreach (var p in list.OrderBy(nameOrder))
Console.WriteLine(p.Id + ":" + p.Name);
// 50:Bar
// 14:Buzz
// 23:Foo
EDIT: be sure to add ToList() after the OrderBy calls if you need a List<T> since the LINQ methods will return an IEnumerable<T>.
Calling ToList() or ToArray() on your IEnumerable<T> will cause it to be immediately evaluated. You can then assign the resulting list or array to "save" your ordered list.

Resources