LINQ: GroupBy with maximum count in each group - linq

I have a list of duplicate numbers:
Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
// {1,1,1,2,2,2,3,3,3}
I group them and get quantity of occurance:
Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
.GroupBy(o => o).Select(o => new { Qty = o.Count(), Num = o.Key })
Qty Num
3 1
3 2
3 3
What I really need is to limit the quantity per group to some number. If the limit is 2 the result for the above grouping would be:
Qty Num
2 1
1 1
2 2
1 2
2 3
1 3
So, if Qty = 10 and limit is 4, the result is 3 rows (4, 4, 2). The Qty of each number is not equal like in example. The specified Qty limit is the same for whole list (doesn't differ based on number).
Thanks

Some of the other answers are making the LINQ query far more complex than it needs to be. Using a foreach loop is certainly faster and more efficient, but the LINQ alternative is still fairly straightforward.
var input = Enumerable.Range(1, 3).SelectMany(x => Enumerable.Repeat(x, 10));
int limit = 4;
var query =
input.GroupBy(x => x)
.SelectMany(g => g.Select((x, i) => new { Val = x, Grp = i / limit }))
.GroupBy(x => x, x => x.Val)
.Select(g => new { Qty = g.Count(), Num = g.Key.Val });

There was a similar question that came up recently asking how to do this in SQL - there's no really elegant solution and unless this is Linq to SQL or Entity Framework (i.e. being translated into a SQL query), I'd really suggest that you not try to solve this problem with Linq and instead write an iterative solution; it's going to be a great deal more efficient and easier to maintain.
That said, if you absolutely must use a set-based ("Linq") method, this is one way you could do it:
var grouped =
from n in nums
group n by n into g
select new { Num = g.Key, Qty = g.Count() };
int maxPerGroup = 2;
var portioned =
from x in grouped
from i in Enumerable.Range(1, grouped.Max(g => g.Qty))
where (x.Qty % maxPerGroup) == (i % maxPerGroup)
let tempQty = (x.Qty / maxPerGroup) == (i / maxPerGroup) ?
(x.Qty % maxPerGroup) : maxPerGroup
select new
{
Num = x.Num,
Qty = (tempQty > 0) ? tempQty : maxPerGroup
};
Compare with the simpler and faster iterative version:
foreach (var g in grouped)
{
int remaining = g.Qty;
while (remaining > 0)
{
int allotted = Math.Min(remaining, maxPerGroup);
yield return new MyGroup(g.Num, allotted);
remaining -= allotted;
}
}

Aaronaught's excellent answer doesn't cover the possibility of getting the best of both worlds... using an extension method to provide an iterative solution.
Untested:
public static IEnumerable<IEnumerable<U>> SplitByMax<T, U>(
this IEnumerable<T> source,
int max,
Func<T, int> maxSelector,
Func<T, int, U> resultSelector
)
{
foreach(T x in source)
{
int number = maxSelector(x);
List<U> result = new List<U>();
do
{
int allotted = Math.Min(number, max);
result.Add(resultSelector(x, allotted));
number -= allotted
} while (number > 0 && max > 0);
yield return result;
}
}
Called by:
var query = grouped.SplitByMax(
10,
o => o.Qty,
(o, i) => new {Num = o.Num, Qty = i}
)
.SelectMany(split => split);

Related

Minimum number of steps using only multiply A by 2, or divide A by 2 or increment A by one to go from number A to B

Given two numbers A and B, what is the minimum number of steps to transform number A to become number B.
A step can either be A *= 2, A++ or A /= 2 if and only if A is an even number.
What is the most efficient algorithm to achieve this?
Suppose A and B can be really large numbers.
Here's my take, done in C#.
var a = 2;
var b = 15;
var found = new HashSet<int>() { a };
var operations = new (string operation, Func<int, bool> condition, Func<int, int> projection)[]
{
("/2", x => x % 2 == 0, x => x / 2),
("*2", x => x <= int.MaxValue / 2, x => x *2),
("+1", x => true, x => x + 1),
};
IEnumerable<(int count, string operations, int value)> Project((int count, string operations, int value) current)
{
foreach (var operation in operations)
{
if (operation.condition(current.value))
{
var value = operation.projection(current.value);
if (!found.Contains(value))
{
found.Add(value);
yield return (current.count + 1, $"{current.operations}, {operation.operation}", value);
}
}
}
}
var candidates = new[] { (count: 0, operations: $"{a}", value: a) };
while (!found.Contains(b))
{
candidates =
candidates
.SelectMany(c => Project(c))
.ToArray();
}
var result = candidates.Where(x => x.value == b).First();
Console.WriteLine($"{result.count} operations: {result.operations} = {result.value}");
That outputs:
5 operations: 2, +1, *2, +1, *2, +1 = 15
Basically, this is starting with a at the zeroth step. It then takes this generation and produces all possible values from the operations to create the next generation. If it produces a value that it has already seen it discards the value as there is an equal or faster operation to produce the value. It keeps repeating until b is found.

Default values for empty groups in Linq GroupBy query

I have a data set of values that I want to summarise in groups. For each group, I want to create an array big enough to contain the values of the largest group. When a group contains less than this maximum number, I want to insert a default value of zero for the empty key values.
Dataset
Col1 Col2 Value
--------------------
A X 10
A Z 15
B X 9
B Y 12
B Z 6
Desired result
X, [10, 9]
Y, [0, 12]
Z, [15, 6]
Note that value "A" in Col1 in the dataset has no value for "Y" in Col2. Value "A" is first group in the outer series, therefore it is the first element that is missing.
The following query creates the result dataset, but does not insert the default zero values for the Y group.
result = data.GroupBy(item => item.Col2)
.Select(group => new
{
name = group.Key,
data = group.Select(item => item.Value)
.ToArray()
})
Actual result
X, [10, 9]
Y, [12]
Z, [15, 6]
What do I need to do to insert a zero as the missing group value?
Here is how I understand it.
Let say we have this
class Data
{
public string Col1, Col2;
public decimal Value;
}
Data[] source =
{
new Data { Col1="A", Col2 = "X", Value = 10 },
new Data { Col1="A", Col2 = "Z", Value = 15 },
new Data { Col1="B", Col2 = "X", Value = 9 },
new Data { Col1="B", Col2 = "Y", Value = 12 },
new Data { Col1="B", Col2 = "Z", Value = 6 },
};
First we need to determine the "fixed" part
var columns = source.Select(e => e.Col1).Distinct().OrderBy(c => c).ToList();
Then we can process with the normal grouping, but inside the group we will left join the columns with group elements which will allow us to achieve the desired behavior
var result = source.GroupBy(e => e.Col2, (key, elements) => new
{
Key = key,
Elements = (from c in columns
join e in elements on c equals e.Col1 into g
from e in g.DefaultIfEmpty()
select e != null ? e.Value : 0).ToList()
})
.OrderBy(e => e.Key)
.ToList();
It won't be pretty, but you can do something like this:
var groups = data.GroupBy(d => d.Col2, d => d.Value)
.Select(g => new { g, count = g.Count() })
.ToList();
int maxG = groups.Max(p => p.count);
var paddedGroups = groups.Select(p => new {
name = p.g.Key,
data = p.g.Concat(Enumerable.Repeat(0, maxG - p.count)).ToArray() });
You can do it like this:-
int maxCount = 0;
var result = data.GroupBy(x => x.Col2)
.OrderByDescending(x => x.Count())
.Select(x =>
{
if (maxCount == 0)
maxCount = x.Count();
var Value = x.Select(z => z.Value);
return new
{
name = x.Key,
data = maxCount == x.Count() ? Value.ToArray() :
Value.Concat(new int[maxCount - Value.Count()]).ToArray()
};
});
Code Explanation:-
Since you need to append default zeros in case when you have less items in any group, I am storing the maxCount (which any group can produce in a variable maxCount) for this I am ordering the items in descending order. Next I am storing the maximum count which the item can producr in maxCount variable. While projecting I am simply checking if number of items in the group is not equal to maxCount then create an integer array of size (maxCount - x.Count) i.e. maximum count minus number of items in current group and appending it to the array.
Working Fiddle.

Linq - return index of collection using conditional logic

I have a collection
List<int> periods = new List<int>();
periods.Add(0);
periods.Add(30);
periods.Add(60);
periods.Add(90);
periods.Add(120);
periods.Add(180);
var overDueDays = 31;
I have a variable over due days. When the vale is between 0 to 29 then I want to return the index of 0. When between 30 - 59 I want to return index 1. The periods list is from db so its not hard coded and values can be different from what are here. What is the best way to to it using LINQ in one statement.
It's not really what Linq is designed for, but (assuming that the range is not fixed) you could do the following to get the index
List<int> periods = new List<int>();
periods.Add(0);
periods.Add(30);
periods.Add(60);
periods.Add(90);
periods.Add(120);
periods.Add(180);
var overDueDays = 31;
var result = periods.IndexOf(periods.First(n => overDueDays < n)) - 1;
You can use .TakeWhile():
int periodIndex = periods.TakeWhile(p => p <= overDueDays).Count() - 1;
how about this ?
var qPeriods = periods.Where(v => v <= overDueDays)
.Select((result, i) => new { index = i })
.Last();
Assuming that periods is sorted, you can use the following approach:
var result = periods.Skip(1)
.Select((o, i) => new { Index = i, Value = o })
.FirstOrDefault(o => overDueDays < o.Value);
if (result != null)
{
Console.WriteLine(result.Index);
}
else
{
Console.WriteLine("Matching range not found!");
}
The first value is skipped since we're interested in comparing with the upper value of the range. By skipping it, the indices fall into place without the need to subtract 1. FirstOrDefault is used in case overDueDays doesn't fall between any of the available ranges.

LINQ Grouping: Is there a cleaner way to do this without a for loop

I am trying to create a very simple distribution chart and I want to display the counts of tests score percentages in their corresponding 10's ranges.
I thought about just doing the grouping on the Math.Round((d.Percentage/10-0.5),0)*10 which should give me the 10's value....but I wasn't sure the best way to do this given that I would probably have missing ranges and all ranges need to appear even if the count is zero. I also thought about doing an outer join on the ranges array but since I'm fairly new to Linq so for the sake of time I opted for the code below. I would however like to know what a better way might be.
Also note: As I tend to work with larger teams with varying experience levels, I'm not all that crazy about ultra compact code unless it remains very readable to the average developer.
Any suggestions?
public IEnumerable<TestDistribution> GetDistribution()
{
var distribution = new List<TestDistribution>();
var ranges = new int[] { 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 };
var labels = new string[] { "0%'s", "10%'s", "20%'s", "30%'s", "40%'s", "50%'s", "60%'s", "70%'s", "80%'s", "90%'s", "100%'s", ">110% "};
for (var n = 0; n < ranges.Count(); n++)
{
var count = 0;
var min = ranges[n];
var max = (n == ranges.Count() - 1) ? decimal.MaxValue : ranges[n+1];
count = (from d in Results
where d.Percentage>= min
&& d.Percentage<max
select d)
.Count();
distribution.Add(new TestDistribution() { Label = labels[n], Frequency = count });
}
return distribution;
}
// ranges and labels in a list of pairs of them
var rangesWithLabels = ranges.Zip(labels, (r,l) => new {Range = r, Label = l});
// create a list of intervals (ie. 0-10, 10-20, .. 110 - max value
var rangeMinMax = ranges.Zip(ranges.Skip(1), (min, max) => new {Min = min, Max = max})
.Union(new[] {new {Min = ranges.Last(), Max = Int32.MaxValue}});
//the grouping is made by the lower bound of the interval found for some Percentage
var resultsDistribution = from c in Results
group c by
rangeMinMax.FirstOrDefault(r=> r.Min <= c.Percentage && c.Percentage < r.Max).Min into g
select new {Percentage = g.Key, Frequency = g.Count() };
// left join betweem the labels and the results with frequencies
var distributionWithLabels =
from l in rangesWithLabels
join r in resultsDistribution on l.Range equals r.Percentage
into rd
from r in rd.DefaultIfEmpty()
select new TestDistribution{
Label = l.Label,
Frequency = r != null ? r.Frequency : 0
};
distribution = distributionWithLabels.ToList();
Another solution if the ranges and labels can be created in another way
var ranges = Enumerable.Range(0, 10)
.Select(c=> new {
Min = c * 10,
Max = (c +1 )* 10,
Label = (c * 10) + "%'s"})
.Union(new[] { new {
Min = 100,
Max = Int32.MaxValue,
Label = ">110% "
}});
var resultsDistribution = from c in Results
group c by ranges.FirstOrDefault(r=> r.Min <= c.Percentage && c.Percentage < r.Max).Min
into g
select new {Percentage = g.Key, Frequency = g.Count() };
var distributionWithLabels =
from l in ranges
join r in resultsDistribution on l.Min equals r.Percentage
into rd
from r in rd.DefaultIfEmpty()
select new TestDistribution{
Label = l.Label,
Frequency = r != null ? r.Frequency : 0
};
This works
public IEnumerable<TestDistribution> GetDistribution()
{
var range = 12;
return Enumerable.Range(0, range).Select(
n => new TestDistribution
{
Label = string.Format("{1}{0}%'s", n*10, n==range-1 ? ">" : ""),
Frequency =
Results.Count(
d =>
d.Percentage >= n*10
&& d.Percentage < ((n == range - 1) ? decimal.MaxValue : (n+1)*10))
});
}

How to calculate multiple averages in one query in linq to entities

How to do this in linq to entities in one query?
SELECT avg(Column1), avg(Column2), ... from MyTable
where ColumnX = 234
??
You could do something like that:
var averages = myTable
.Where(item => item.ColumnX == 234)
.Aggregate(
new { count = 0, sum1 = 0.0, sum2 = 0.0 },
(acc, item) => new { count = acc.count + 1, sum1 = acc.sum1 + item.Column1, sum2 = acc.sum2 + item.Column2 },
acc => new { avg1 = acc.sum1 / acc.count, avg2 = acc.sum2 / acc.count });
Note the call to AsEnumerable() to force Aggregate to be executed locally (as EF probably doesn't know how to convert it to SQL) Actually it seems to work ;)
Alternatively, you could use this query:
var averages =
from item in table
where item.ColumnX == 234
group item by 1 into g
select new
{
Average1 = g.Average(i => i.Column1),
Average2 = g.Average(i => i.Column2)
};
The use of group by here is not very intuitive, but it's probably easier to read than the other solution. Not sure it can be converted to SQL though...

Resources