How to get the Max() of a Count() with LINQ - linq

I'm new to LINQ and I have this situation. I have this table:
ID Date Range
1 10/10/10 9-10
2 10/10/10 9-10
3 10/10/10 9-10
4 10/10/10 8-9
5 10/11/10 1-2
6 10/11/10 1-2
7 10/12/10 5-6
I just want to list the Maximun of rows per date by range, like this:
Date Range Total
10/10/10 9-10 3
10/11/10 1-2 2
10/12/10 5-6 1
I want to do this by using LINQ, do you have any ideas of how to do this?

I think something along these lines should work:
List<MyTable> items = GetItems();
var orderedByMax = from i in items
group i by i.Date into g
let q = g.GroupBy(i => i.Range)
.Select(g2 => new {Range = g2.Key, Count = g2.Count()})
.OrderByDescending(i => i.Count)
let max = q.FirstOrDefault()
select new {
Date = g.Key,
Range = max.Range,
Total = max.Count
};

Using extension methods:
List<MyTable> items = GetItems();
var rangeTotals = items.GroupBy(x => new { x.Date, x.Range }) // Group by Date + Range
.Select(g => new {
Date = g.Key.Date,
Range = g.Key.Range,
Total = g.Count() // Count total of identical ranges per date
});
var rangeMaxTotals = rangeTotals.Where(rt => !rangeTotals.Any(z => z.Date == rt.Date && z.Total > rt.Total)); // Get maximum totals for each date

unfortunately I can't test this at the moment but give this a try:
List<MyTable> items = GetItems();
items.Max(t=>t.Range.Distinct().Count());

This approach:
1) Groups by Date
2) For each Date, groups by Range and calculates the Total
3) For each Date, selects the item with the greatest Total
4) You end up with your result
public sealed class Program
{
public static void Main(string[] args)
{
var items = new[]
{
new { ID = 1, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 2, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 3, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 4, Date = new DateTime(10, 10, 10), Range = "8-9" },
new { ID = 5, Date = new DateTime(10, 10, 11), Range = "1-2" },
new { ID = 6, Date = new DateTime(10, 10, 11), Range = "1-2" },
new { ID = 7, Date = new DateTime(10, 10, 12), Range = "5-6" },
};
var itemsWithTotals = items
.GroupBy(item => item.Date) // Group by Date.
.Select(groupByDate => groupByDate
.GroupBy(item => item.Range) // Group by Range.
.Select(groupByRange => new
{
Date = groupByDate.Key,
Range = groupByRange.Key,
Total = groupByRange.Count()
}) // Got the totals for each grouping.
.MaxElement(item => item.Total)); // For each Date, grab the item (grouped by Range) with the greatest Total.
foreach (var item in itemsWithTotals)
Console.WriteLine("{0} {1} {2}", item.Date.ToShortDateString(), item.Range, item.Total);
Console.Read();
}
}
/// <summary>
/// From the book LINQ in Action, Listing 5.35.
/// </summary>
static class ExtensionMethods
{
public static TElement MaxElement<TElement, TData>(this IEnumerable<TElement> source, Func<TElement, TData> selector) where TData : IComparable<TData>
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
bool firstElement = true;
TElement result = default(TElement);
TData maxValue = default(TData);
foreach (TElement element in source)
{
var candidate = selector(element);
if (firstElement || (candidate.CompareTo(maxValue) > 0))
{
firstElement = false;
maxValue = candidate;
result = element;
}
}
return result;
}
}
According to LINQ in Action (Chapter 5.3.3 - Will LINQ to Objects hurt the performance of my code?), using the MaxElement extension method is one of the most effecient approaches. I think the performance would be O(4n); one for the first GroupBy, two for the second GroupBy, three for the Count(), and four for loop within MaxElement.
DrDro's approach is going to be more like O(n^2) since it loops the entire list for each item in the list.
StriplingWarrior's approach is going to be closer to O(n log n) because it sorts the items. Though I'll admit, there may be some crazy magic in there that I don't understand.

Related

How to use LINQ to find all items in list which have the most members in another list?

Given:
class Item {
public int[] SomeMembers { get; set; }
}
var items = new []
{
new Item { SomeMembers = new [] { 1, 2 } }, //0
new Item { SomeMembers = new [] { 1, 2 } }, //1
new Item { SomeMembers = new [] { 1 } } //2
}
var secondList = new int[] { 1, 2, 3 };
I need to find all the Items in items with the most of it's SomeMembers occurring in secondList.
In the example above I would expect Items 0 and 1 to be returned but not 2.
I know I could do it with things like loops or Contains() but it seems there must be a more elegant or efficient way?
This can be written pretty easily:
var result = items.Where(item => item.SomeMembers.Count(secondList.Contains) * 2
>= item.SomeMembers.Length);
Or possibly (I can never guess whether method group conversions will work):
var result = items.Where(item => item.SomeMembers.Count(x => secondList.Contains(x)) * 2
>= item.SomeMembers.Length);
Or to pull it out:
Func<int, bool> inSecondList = secondList.Contains;
var result = items.Where(item => item.SomeMembers.Count(inSecondList) * 2
>= item.SomeMembers.Length);
If secondList becomes large, you should consider using a HashSet<int> instead.
EDIT: To avoid evaluating SomeMembers twice, you could create an extension method:
public static bool MajoritySatisfied<T>(this IEnumerable<T> source,
Func<T, bool> condition)
{
int total = 0, satisfied = 0;
foreach (T item in source)
{
total++;
if (condition(item))
{
satisfied++;
}
}
return satisfied * 2 >= total;
}
Then:
var result = items.Where(item => item.MajoritySatisfied(secondList.Contains));

How would you address this aggregation/reporting scenario based on RavenDB document data?

We're using RavenDB (2261) as the back end for a queue-based video upload system, and we've been asked to provide a 'live' SLA report on various metrics to do with the upload system.
The document format looks like this:
{
"ClipGuid": "01234567-1234-abcd-efef-123412341234",
"CustomerId": "ABC123",
"Title": "Shakespeare in Love",
"DurationInSeconds": 82,
"StateChanges": [
{
"OldState": "DoesNotExist",
"NewState": "ReceivedFromUpload",
"ChangedAt": "2013-03-15T15:38:38.7050002Z"
},
{
"OldState": "ReceivedFromUpload",
"NewState": "Validating",
"ChangedAt": "2013-03-15T15:38:38.8453975Z"
},
{
"OldState": "Validating",
"NewState": "AwaitingSubmission",
"ChangedAt": "2013-03-15T15:38:39.9529762Z"
},
{
"OldState": "AwaitingSubmission",
"NewState": "Submitted",
"ChangedAt": "2013-03-15T15:38:43.4785084Z"
},
{
"OldState": "Submitted",
"NewState": "Playable",
"ChangedAt": "2013-03-15T15:41:39.5523223Z"
}
],
}
Within each ClipInfo record, there's a collection of StateChanges that are added each time the clip is passed from one part of the processing chain to another. What we need to to is to reduce these StateChanges to two specific timespans - we need to know how long a clip took to change from DoesNotExist to AwaitingSubmission, and how long it took from DoesNotExist to Playable. We then need to group these durations by date/time, so we can draw a simple SLA report that looks like this:
The necessary predicates can be expressed as LINQ statements but when I try specifying this sort of complex logic within a Raven query I just seem to get back empty results (or lots of DateTime.MinValue results)
I realise document databases like Raven aren't ideal for reporting - and we're happy to explore replication into SQL or some other sort of caching mechanism - but at the moment I just can't see any way of extracting the data other than doing multiple queries to retrieve the entire contents of the store and then performing the calculations in .NET.
Any recommendations?
Thanks,
Dylan
I have made some assumptions which you may need to adjust for:
You operate strictly in the UTC time zone - your "day" is midnight to midnight UTC.
Your week is Sunday through Saturday
The date you want to group by is the first status date reported (the one marked with "DoesNotExist" as its old state.)
You will need a separate map/reduce index per date bracket that you are grouping on - Daily, Weekly, Monthly.
They are almost identical, except for how the starting date is defined. If you want to get creative, you might be able to come up with a way to make these into a generic index definition - but they will always end up being three separate indexes in RavenDB.
// This is the resulting class that all of these indexes will return
public class ClipStats
{
public int CountClips { get; set; }
public int NumPassedWithinTwentyPct { get; set; }
public int NumPlayableWithinOneHour { get; set; }
public DateTime Starting { get; set; }
}
public class ClipStats_ByDay : AbstractIndexCreationTask<ClipInfo, ClipStats>
{
public ClipStats_ByDay()
{
Map = clips => from clip in clips
let state1 = clip.StateChanges.FirstOrDefault(x => x.OldState == "DoesNotExist")
let state2 = clip.StateChanges.FirstOrDefault(x => x.NewState == "AwaitingSubmission")
let state3 = clip.StateChanges.FirstOrDefault(x => x.NewState == "Playable")
let time1 = state2.ChangedAt - state1.ChangedAt
let time2 = state3.ChangedAt - state1.ChangedAt
select new
{
CountClips = 1,
NumPassedWithinTwentyPct = time1.TotalSeconds < clip.DurationInSeconds * 0.2 ? 1 : 0,
NumPlayableWithinOneHour = time2.TotalHours < 1 ? 1 : 0,
Starting = state1.ChangedAt.Date
};
Reduce = results => from result in results
group result by result.Starting
into g
select new
{
CountClips = g.Sum(x => x.CountClips),
NumPassedWithinTwentyPct = g.Sum(x => x.NumPassedWithinTwentyPct),
NumPlayableWithinOneHour = g.Sum(x => x.NumPlayableWithinOneHour),
Starting = g.Key
};
}
}
public class ClipStats_ByWeek : AbstractIndexCreationTask<ClipInfo, ClipStats>
{
public ClipStats_ByWeek()
{
Map = clips => from clip in clips
let state1 = clip.StateChanges.FirstOrDefault(x => x.OldState == "DoesNotExist")
let state2 = clip.StateChanges.FirstOrDefault(x => x.NewState == "AwaitingSubmission")
let state3 = clip.StateChanges.FirstOrDefault(x => x.NewState == "Playable")
let time1 = state2.ChangedAt - state1.ChangedAt
let time2 = state3.ChangedAt - state1.ChangedAt
select new
{
CountClips = 1,
NumPassedWithinTwentyPct = time1.TotalSeconds < clip.DurationInSeconds * 0.2 ? 1 : 0,
NumPlayableWithinOneHour = time2.TotalHours < 1 ? 1 : 0,
Starting = state1.ChangedAt.Date.AddDays(0 - (int) state1.ChangedAt.Date.DayOfWeek)
};
Reduce = results => from result in results
group result by result.Starting
into g
select new
{
CountClips = g.Sum(x => x.CountClips),
NumPassedWithinTwentyPct = g.Sum(x => x.NumPassedWithinTwentyPct),
NumPlayableWithinOneHour = g.Sum(x => x.NumPlayableWithinOneHour),
Starting = g.Key
};
}
}
public class ClipStats_ByMonth : AbstractIndexCreationTask<ClipInfo, ClipStats>
{
public ClipStats_ByMonth()
{
Map = clips => from clip in clips
let state1 = clip.StateChanges.FirstOrDefault(x => x.OldState == "DoesNotExist")
let state2 = clip.StateChanges.FirstOrDefault(x => x.NewState == "AwaitingSubmission")
let state3 = clip.StateChanges.FirstOrDefault(x => x.NewState == "Playable")
let time1 = state2.ChangedAt - state1.ChangedAt
let time2 = state3.ChangedAt - state1.ChangedAt
select new
{
CountClips = 1,
NumPassedWithinTwentyPct = time1.TotalSeconds < clip.DurationInSeconds * 0.2 ? 1 : 0,
NumPlayableWithinOneHour = time2.TotalHours < 1 ? 1 : 0,
Starting = state1.ChangedAt.Date.AddDays(1 - state1.ChangedAt.Date.Day)
};
Reduce = results => from result in results
group result by result.Starting
into g
select new
{
CountClips = g.Sum(x => x.CountClips),
NumPassedWithinTwentyPct = g.Sum(x => x.NumPassedWithinTwentyPct),
NumPlayableWithinOneHour = g.Sum(x => x.NumPlayableWithinOneHour),
Starting = g.Key
};
}
}
Then when you want to query...
var now = DateTime.UtcNow;
var today = now.Date;
var dailyStats = session.Query<ClipStats, ClipStats_ByDay>()
.FirstOrDefault(x => x.Starting == today);
var startOfWeek = today.AddDays(0 - (int) today.DayOfWeek);
var weeklyStats = session.Query<ClipStats, ClipStats_ByWeek>()
.FirstOrDefault(x => x.Starting == startOfWeek);
var startOfMonth = today.AddDays(1 - today.Day);
var monthlyStats = session.Query<ClipStats, ClipStats_ByMonth>()
.FirstOrDefault(x => x.Starting == startOfMonth);
In the results, you will have totals. So if you want percent averages for your SLA, simply divide the statistic by the count, which is also returned.

Using Linq to determine the missing combination of records

Let me start to explain with an example.
var vProducts = new[] {
new { Product = "A", Location ="Y", Month = "January", Demand = 50 },
new { Product = "A", Location ="Y", Month = "February", Demand = 100 },
new { Product = "A", Location ="Y", Month = "March", Demand = 20 },
new { Product = "A", Location ="Y", Month = "June", Demand = 10 }
};
var vPeriods = new[] {
new { Priority = 1, Month = "January" },
new { Priority = 2, Month = "February" },
new { Priority = 3, Month = "March" },
new { Priority = 4, Month = "April" },
new { Priority = 5, Month = "May" },
new { Priority = 6, Month = "June" }
};
var vAll = from p in vProducts
from t in vPeriods
select new
{
Product = p.Product,
Location = p.Location,
Period = t.Priority,
PeriodName = t.Month,
Demand = p.Demand
};
This above query will create all combinations of Products & Period. But, I need to get a list of all products along with the ones that do not have matching Month as shown below.
example
Product Location Priority Month Demand
A Y 1 January 50
A Y 2 February 100
A Y 3 March 20
A Y 4 April null
A Y 5 May null
A Y 6 June 10
Thanks for any comments.
You want to do a left outer join, it will go something like:
var res = from period in vPeriods
join product in vProducts on period.Month equals product.Month into bla
from p in bla.DefaultIfEmpty()
select new { period.Month, period.Priority, Product = p == null ? null : p.Product, Demand = p == null ? -1 : p.Demand };
foreach (var a in res)
{
Console.WriteLine(string.Format("{0} {1} {2}", a.Product, a.Month, a.Demand));
}
Of course that products that do not have matching months don't have locations etc. (as you stated in your example)

Simple LINQ query

I have a List of X items. I want to have LINQ query that will convert it into batches (a List of Lists), where each batch has 4 items, except for the last one which can have 1-4 (whatever the remainder is). Also, the number 4 should be configurable so it could 5, 17, etc.
Can anyone tell me how to write that?
List<Item> myItems = ...;
List<List<Item>> myBatches = myItems.????
Thank you in advance!
If you're happy with the results being typed as IEnumerable<IEnumerable<T>> then you can do this:
int groupSize = 4;
var myBatches = myItems.Select((x, i) => new { Val = x, Idx = i })
.GroupBy(x => x.Idx / groupSize,
x => x.Val);
If you want an actual List<List<T>> then you'll need to add a couple of extra ToList calls:
int groupSize = 4;
var myBatches = myItems.Select((x, i) => new { Val = x, Idx = i })
.GroupBy(x => x.Idx / groupSize,
x => x.Val,
(k, g) => g.ToList())
.ToList();
Here is a good article about using Take and Skip to do paging, which is identical functionality to what you are requesting. It doesn't get you all of the way to a single line of LINQ, but hopefully helps.
This made me think of how we did this before LINQ.
var vessels = new List<Vessel>()
{ new Vessel() { id = 8, name = "Millennium Falcon" },
new Vessel() { id = 4, name = "Ebon Hawk" },
new Vessel() { id = 34, name = "Virago"},
new Vessel() { id = 12, name = "Naboo royal starship"},
new Vessel() { id = 17, name = "Radiant VII"},
new Vessel() { id = 7, name = "Lambda-class shuttle"},
new Vessel() { id = 23, name = "Rogue Shadow"}};
var chunksize=2;
// With LINQ
var vesselGroups = vessels.Select((v, i) => new { Vessel = v, Index = i })
.GroupBy(c => c.Index / chunksize, c => c.Vessel, (t,e)=>e.ToList())
.ToList();
// Before LINQ (most probably not optimal)
var groupedVessels = new List<List<Vessel>>();
var g = new List<Vessel>();
var chunk = chunksize;
foreach(var vessel in vessels)
{
g.Add(vessel);
chunk--;
if (chunk == 0)
{
groupedVessels.Add(g);
g = new List<Vessel>();
chunk = chunksize;
}
}
groupedVessels.Add(g);

How do I transfer this logic into a LINQ statement?

I can't get this bit of logic converted into a Linq statement and it is driving me nuts. I have a list of items that have a category and a createdondate field. I want to group by the category and only return items that have the max date for their category.
So for example, the list contains items with categories 1 and 2. The first day (1/1) I post two items to both categories 1 and 2. The second day (1/2) I post three items to category 1. The list should return the second day postings to category 1 and the first day postings to category 2.
Right now I have it grouping by the category then running through a foreach loop to compare each item in the group with the max date of the group, if the date is less than the max date it removes the item.
There's got to be a way to take the loop out, but I haven't figured it out!
You can do something like that :
from item in list
group item by item.Category into g
select g.OrderByDescending(it => it.CreationDate).First();
However, it's not very efficient, because it needs to sort the items of each group, which is more complex than necessary (you don't actually need to sort, you just need to scan the list once). So I created this extension method to find the item with the max value of a property (or function) :
public static T WithMax<T, TValue>(this IEnumerable<T> source, Func<T, TValue> selector)
{
var max = default(TValue);
var withMax = default(T);
var comparer = Comparer<TValue>.Default;
bool first = true;
foreach (var item in source)
{
var value = selector(item);
int compare = comparer.Compare(value, max);
if (compare > 0 || first)
{
max = value;
withMax = item;
}
first = false;
}
return withMax;
}
You can use it as follows :
from item in list
group item by item.Category into g
select g.WithMax(it => it.CreationDate);
UPDATE : As Anthony noted in his comment, this code doesn't exactly answer the question... if you want all items which date is the maximum of their category, you can do something like that :
from item in list
group item by item.Category into g
let maxDate = g.Max(it => it.CreationDate)
select new
{
Category = g.Key,
Items = g.Where(it => it.CreationDate == maxDate)
};
How about this:
private class Test
{
public string Category { get; set; }
public DateTime PostDate { get; set; }
public string Post { get; set; }
}
private void Form1_Load(object sender, EventArgs e)
{
List<Test> test = new List<Test>();
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 5, 12, 0, 0), Post = "A1" });
test.Add(new Test() { Category = "B", PostDate = new DateTime(2010, 5, 5, 13, 0, 0), Post = "B1" });
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 6, 12, 0, 0), Post = "A2" });
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 6, 13, 0, 0), Post = "A3" });
test.Add(new Test() { Category = "A", PostDate = new DateTime(2010, 5, 6, 14, 0, 0), Post = "A4" });
var q = test.GroupBy(t => t.Category).Select(g => new { grp = g, max = g.Max(t2 => t2.PostDate).Date }).SelectMany(x => x.grp.Where(t => t.PostDate >= x.max));
}
Reformatting luc's excellent answer to query comprehension form. I like this better for this kind of query because the scoping rules let me write more concisely.
from item in source
group item by item.Category into g
let max = g.Max(item2 => item2.PostDate).Date
from item3 in g
where item3.PostDate.Date == max
select item3;

Resources