How to calculate multiple averages in one query in linq to entities - linq

How to do this in linq to entities in one query?
SELECT avg(Column1), avg(Column2), ... from MyTable
where ColumnX = 234
??

You could do something like that:
var averages = myTable
.Where(item => item.ColumnX == 234)
.Aggregate(
new { count = 0, sum1 = 0.0, sum2 = 0.0 },
(acc, item) => new { count = acc.count + 1, sum1 = acc.sum1 + item.Column1, sum2 = acc.sum2 + item.Column2 },
acc => new { avg1 = acc.sum1 / acc.count, avg2 = acc.sum2 / acc.count });
Note the call to AsEnumerable() to force Aggregate to be executed locally (as EF probably doesn't know how to convert it to SQL) Actually it seems to work ;)
Alternatively, you could use this query:
var averages =
from item in table
where item.ColumnX == 234
group item by 1 into g
select new
{
Average1 = g.Average(i => i.Column1),
Average2 = g.Average(i => i.Column2)
};
The use of group by here is not very intuitive, but it's probably easier to read than the other solution. Not sure it can be converted to SQL though...

Related

Default values for empty groups in Linq GroupBy query

I have a data set of values that I want to summarise in groups. For each group, I want to create an array big enough to contain the values of the largest group. When a group contains less than this maximum number, I want to insert a default value of zero for the empty key values.
Dataset
Col1 Col2 Value
--------------------
A X 10
A Z 15
B X 9
B Y 12
B Z 6
Desired result
X, [10, 9]
Y, [0, 12]
Z, [15, 6]
Note that value "A" in Col1 in the dataset has no value for "Y" in Col2. Value "A" is first group in the outer series, therefore it is the first element that is missing.
The following query creates the result dataset, but does not insert the default zero values for the Y group.
result = data.GroupBy(item => item.Col2)
.Select(group => new
{
name = group.Key,
data = group.Select(item => item.Value)
.ToArray()
})
Actual result
X, [10, 9]
Y, [12]
Z, [15, 6]
What do I need to do to insert a zero as the missing group value?
Here is how I understand it.
Let say we have this
class Data
{
public string Col1, Col2;
public decimal Value;
}
Data[] source =
{
new Data { Col1="A", Col2 = "X", Value = 10 },
new Data { Col1="A", Col2 = "Z", Value = 15 },
new Data { Col1="B", Col2 = "X", Value = 9 },
new Data { Col1="B", Col2 = "Y", Value = 12 },
new Data { Col1="B", Col2 = "Z", Value = 6 },
};
First we need to determine the "fixed" part
var columns = source.Select(e => e.Col1).Distinct().OrderBy(c => c).ToList();
Then we can process with the normal grouping, but inside the group we will left join the columns with group elements which will allow us to achieve the desired behavior
var result = source.GroupBy(e => e.Col2, (key, elements) => new
{
Key = key,
Elements = (from c in columns
join e in elements on c equals e.Col1 into g
from e in g.DefaultIfEmpty()
select e != null ? e.Value : 0).ToList()
})
.OrderBy(e => e.Key)
.ToList();
It won't be pretty, but you can do something like this:
var groups = data.GroupBy(d => d.Col2, d => d.Value)
.Select(g => new { g, count = g.Count() })
.ToList();
int maxG = groups.Max(p => p.count);
var paddedGroups = groups.Select(p => new {
name = p.g.Key,
data = p.g.Concat(Enumerable.Repeat(0, maxG - p.count)).ToArray() });
You can do it like this:-
int maxCount = 0;
var result = data.GroupBy(x => x.Col2)
.OrderByDescending(x => x.Count())
.Select(x =>
{
if (maxCount == 0)
maxCount = x.Count();
var Value = x.Select(z => z.Value);
return new
{
name = x.Key,
data = maxCount == x.Count() ? Value.ToArray() :
Value.Concat(new int[maxCount - Value.Count()]).ToArray()
};
});
Code Explanation:-
Since you need to append default zeros in case when you have less items in any group, I am storing the maxCount (which any group can produce in a variable maxCount) for this I am ordering the items in descending order. Next I am storing the maximum count which the item can producr in maxCount variable. While projecting I am simply checking if number of items in the group is not equal to maxCount then create an integer array of size (maxCount - x.Count) i.e. maximum count minus number of items in current group and appending it to the array.
Working Fiddle.

Linq FirstOrDefault List inside a query

I have this linq query
var numberGroups =
from n in VISRUBs.Where(a => a.VISANA.VISITE.DATEVIS <= d && a.VISANA.VISITE.PANUM == p)
group n by n.RUBRIQUE into g
select new {
RemainderCHAPLIB = g.Key.ANALYSE.CHAPITRE.LIBELLE,
RemainderLIB = g.Key.LIBELLE,
RemainderRUNUM = g.Key.RUNUM,
vals = from vlist in g.OrderByDescending(a=>a.VISANA.VISITE.DATEVIS)
select vlist.VALEUR
};
which gives me this result in Linqpad
What I want is to select the first and second item from the last field (vals) which is a List<string>.
I have tried this:
var numberGroups =
from n in VISRUBs.Where(a => a.VISANA.VISITE.DATEVIS <= d && a.VISANA.VISITE.PANUM == p)
group n by n.RUBRIQUE into g
select new {
RemainderCHAPLIB = g.Key.ANALYSE.CHAPITRE.LIBELLE,
RemainderLIB = g.Key.LIBELLE,
RemainderRUNUM = g.Key.RUNUM,
vals = from vlist in g.OrderByDescending(a => a.VISANA.VISITE.DATEVIS)
select vlist.VALEUR
};
var lst = from n in numberGroups
select new
{
RemainderCHAPLIB = n.RemainderCHAPLIB,
RemainderLIB = n.RemainderLIB,
RemainderRUNUM = n.RemainderRUNUM,
VAL = n.vals.FirstOrDefault()
};
but it didn't work, I got an exception:
Dynamic SQL ErrorSQL error code = -104Token unknown - line 54, column 1OUTER
found it !
var lst = from n in numberGroups.ToList()
select new
{
RemainderCHAPLIB = n.RemainderCHAPLIB,
RemainderLIB = n.RemainderLIB,
RemainderRUNUM = n.RemainderRUNUM,
VAL = n.vals.FirstOrDefault(),
ANT = n.vals.Skip(1).FirstOrDefault()
};

LINQ Grouping: Is there a cleaner way to do this without a for loop

I am trying to create a very simple distribution chart and I want to display the counts of tests score percentages in their corresponding 10's ranges.
I thought about just doing the grouping on the Math.Round((d.Percentage/10-0.5),0)*10 which should give me the 10's value....but I wasn't sure the best way to do this given that I would probably have missing ranges and all ranges need to appear even if the count is zero. I also thought about doing an outer join on the ranges array but since I'm fairly new to Linq so for the sake of time I opted for the code below. I would however like to know what a better way might be.
Also note: As I tend to work with larger teams with varying experience levels, I'm not all that crazy about ultra compact code unless it remains very readable to the average developer.
Any suggestions?
public IEnumerable<TestDistribution> GetDistribution()
{
var distribution = new List<TestDistribution>();
var ranges = new int[] { 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 };
var labels = new string[] { "0%'s", "10%'s", "20%'s", "30%'s", "40%'s", "50%'s", "60%'s", "70%'s", "80%'s", "90%'s", "100%'s", ">110% "};
for (var n = 0; n < ranges.Count(); n++)
{
var count = 0;
var min = ranges[n];
var max = (n == ranges.Count() - 1) ? decimal.MaxValue : ranges[n+1];
count = (from d in Results
where d.Percentage>= min
&& d.Percentage<max
select d)
.Count();
distribution.Add(new TestDistribution() { Label = labels[n], Frequency = count });
}
return distribution;
}
// ranges and labels in a list of pairs of them
var rangesWithLabels = ranges.Zip(labels, (r,l) => new {Range = r, Label = l});
// create a list of intervals (ie. 0-10, 10-20, .. 110 - max value
var rangeMinMax = ranges.Zip(ranges.Skip(1), (min, max) => new {Min = min, Max = max})
.Union(new[] {new {Min = ranges.Last(), Max = Int32.MaxValue}});
//the grouping is made by the lower bound of the interval found for some Percentage
var resultsDistribution = from c in Results
group c by
rangeMinMax.FirstOrDefault(r=> r.Min <= c.Percentage && c.Percentage < r.Max).Min into g
select new {Percentage = g.Key, Frequency = g.Count() };
// left join betweem the labels and the results with frequencies
var distributionWithLabels =
from l in rangesWithLabels
join r in resultsDistribution on l.Range equals r.Percentage
into rd
from r in rd.DefaultIfEmpty()
select new TestDistribution{
Label = l.Label,
Frequency = r != null ? r.Frequency : 0
};
distribution = distributionWithLabels.ToList();
Another solution if the ranges and labels can be created in another way
var ranges = Enumerable.Range(0, 10)
.Select(c=> new {
Min = c * 10,
Max = (c +1 )* 10,
Label = (c * 10) + "%'s"})
.Union(new[] { new {
Min = 100,
Max = Int32.MaxValue,
Label = ">110% "
}});
var resultsDistribution = from c in Results
group c by ranges.FirstOrDefault(r=> r.Min <= c.Percentage && c.Percentage < r.Max).Min
into g
select new {Percentage = g.Key, Frequency = g.Count() };
var distributionWithLabels =
from l in ranges
join r in resultsDistribution on l.Min equals r.Percentage
into rd
from r in rd.DefaultIfEmpty()
select new TestDistribution{
Label = l.Label,
Frequency = r != null ? r.Frequency : 0
};
This works
public IEnumerable<TestDistribution> GetDistribution()
{
var range = 12;
return Enumerable.Range(0, range).Select(
n => new TestDistribution
{
Label = string.Format("{1}{0}%'s", n*10, n==range-1 ? ">" : ""),
Frequency =
Results.Count(
d =>
d.Percentage >= n*10
&& d.Percentage < ((n == range - 1) ? decimal.MaxValue : (n+1)*10))
});
}

LINQ Query to Find the Maximum Mean for a Time Span

I have a set of data that has two points; "watts" and a time stamp.
Each data point is separated by 1 second.
So it looks like this:
0:01 100
0:02 110
0:03 133
0:04 280
.....
The data set is a couple hours long.
I'd like to write a query where I can find the maximum average watts for different time periods (5 seconds, 1 minutes, 5 minutes, 20 minutes, ect).
I'd also like to know where in the data set that maximum average took place.
Edit
I think I need to do a query with a moving average and the appropriate bucket (let's say 10 seconds). Once I get that result, I query that to find the max.
Try this (I used Linqpad, C# statements):
var rnd = new Random();
// Create some data.
var tw = Enumerable.Range(0, 3600)
.Select(i => Tuple.Create(new TimeSpan(0, 0, i), rnd.Next(1000))).ToList();
// The query.
int secondsPerInterval = 10;
var averages =
tw.GroupBy(t => (int) (t.Item1.TotalSeconds/secondsPerInterval) + 1)
.Select(g => new
{
Seconds = g.Key * secondsPerInterval,
Avg = g.Average(t => t.Item2)
})
.ToList();
var max = averages.Where(tmp => tmp.Avg == averages.Max(tmp1 => tmp1.Avg));
max.Dump();
The trick is to group your timespans by the integral part of TotalSeconds divided by the required interval length.
You could do tw.AsParallel().GroupBy..., but you should benchmark if you loose more by parallellization overhead than you gain.
Okay, a guy at work helped me. Here's the answer in LINQ Pad.
var period = 10;
var rnd = new Random();
// Create some data.
var series = Enumerable.Range(0, 3600)
.Select(i => Tuple.Create(new TimeSpan(0, 0, i), rnd.Next(300))).ToList();
var item = Enumerable.Range(0, 3600).AsParallel()
.Select(i => series.Skip(i).Take(10))
.Select((e, i) => new { Average = e.Sum(x => x.Item2) / e.Count(), Second = i })
.OrderByDescending(a => a.Second).Dump();
item.First().Dump();
try this (untested):
for (int i = 0; i < = dataList.count ; i = i + (TimePeriod))
(from p in dataList.Skip(i).Take(TimePeriod) select p).Average(s => s.Watts)

LINQ: GroupBy with maximum count in each group

I have a list of duplicate numbers:
Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
// {1,1,1,2,2,2,3,3,3}
I group them and get quantity of occurance:
Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
.GroupBy(o => o).Select(o => new { Qty = o.Count(), Num = o.Key })
Qty Num
3 1
3 2
3 3
What I really need is to limit the quantity per group to some number. If the limit is 2 the result for the above grouping would be:
Qty Num
2 1
1 1
2 2
1 2
2 3
1 3
So, if Qty = 10 and limit is 4, the result is 3 rows (4, 4, 2). The Qty of each number is not equal like in example. The specified Qty limit is the same for whole list (doesn't differ based on number).
Thanks
Some of the other answers are making the LINQ query far more complex than it needs to be. Using a foreach loop is certainly faster and more efficient, but the LINQ alternative is still fairly straightforward.
var input = Enumerable.Range(1, 3).SelectMany(x => Enumerable.Repeat(x, 10));
int limit = 4;
var query =
input.GroupBy(x => x)
.SelectMany(g => g.Select((x, i) => new { Val = x, Grp = i / limit }))
.GroupBy(x => x, x => x.Val)
.Select(g => new { Qty = g.Count(), Num = g.Key.Val });
There was a similar question that came up recently asking how to do this in SQL - there's no really elegant solution and unless this is Linq to SQL or Entity Framework (i.e. being translated into a SQL query), I'd really suggest that you not try to solve this problem with Linq and instead write an iterative solution; it's going to be a great deal more efficient and easier to maintain.
That said, if you absolutely must use a set-based ("Linq") method, this is one way you could do it:
var grouped =
from n in nums
group n by n into g
select new { Num = g.Key, Qty = g.Count() };
int maxPerGroup = 2;
var portioned =
from x in grouped
from i in Enumerable.Range(1, grouped.Max(g => g.Qty))
where (x.Qty % maxPerGroup) == (i % maxPerGroup)
let tempQty = (x.Qty / maxPerGroup) == (i / maxPerGroup) ?
(x.Qty % maxPerGroup) : maxPerGroup
select new
{
Num = x.Num,
Qty = (tempQty > 0) ? tempQty : maxPerGroup
};
Compare with the simpler and faster iterative version:
foreach (var g in grouped)
{
int remaining = g.Qty;
while (remaining > 0)
{
int allotted = Math.Min(remaining, maxPerGroup);
yield return new MyGroup(g.Num, allotted);
remaining -= allotted;
}
}
Aaronaught's excellent answer doesn't cover the possibility of getting the best of both worlds... using an extension method to provide an iterative solution.
Untested:
public static IEnumerable<IEnumerable<U>> SplitByMax<T, U>(
this IEnumerable<T> source,
int max,
Func<T, int> maxSelector,
Func<T, int, U> resultSelector
)
{
foreach(T x in source)
{
int number = maxSelector(x);
List<U> result = new List<U>();
do
{
int allotted = Math.Min(number, max);
result.Add(resultSelector(x, allotted));
number -= allotted
} while (number > 0 && max > 0);
yield return result;
}
}
Called by:
var query = grouped.SplitByMax(
10,
o => o.Qty,
(o, i) => new {Num = o.Num, Qty = i}
)
.SelectMany(split => split);

Resources