Guard against divide-by-0 in a LINQ groupby select statement - linq

I have the following LINQ statement:
var result = CommisionDataContext.COMMISSIONS
.Where(c.PRODUCT == 'Computer')
.GroupBy(g => new
{
CostCenter = g.COST_CENTER,
Product = g.PRODUCT,
}
.Select(group => new
{
Revenue = group.Sum(p => p.REVENUE),
Volume = group.Sum(p => p.VOLUME),
Avg = group.Sum(p => p.REVENUE) / group.Sum(p => p.VOLUME),
});
How can I guard against the Divide by zero exception which could happen, and if it does I just want Avg to be equal to 0.

To avoid summing more than necessary, you should be able to do something like this (note that your code does not compile, however):
...
.Select(group => new
{
Revenue = group.Sum(p => p.REVENUE),
Volume = group.Sum(p => p.VOLUME)
})
.Select(item => new
{
item.Revenue,
item.Volume,
Avg = item.Volume == 0 ? 0 : item.Revenue / item.Volume
});

Be careful because C# Double can handle Infinity values, in fact if you try
var x = 5.0 / 0.0
you will find that x = Infinity (without Exception) and this is really different from 0! So, if you want your AVG = 0 be sure it's correct from a mathematical and representational point of view (for example I work with marketing formulas and charts and it's important to know if my AVG is a true 0 or an approximation to avoid wrong charts representation and misconception).
Anyway, this code avoids a Divide by zero Exception (you need to set AVG as Double) if you find useful the exact value of AVG operation taking care of the Infinity return value if the volumes sum is 0.
Avg = Convert.ToDouble(group.Sum(p => p.REVENUE)) / group.Sum(p => p.VOLUME))
Otherwise you can use the "check before" approach with a single evaluation.
var x = Convert.ToDouble(group.Sum(p => p.REVENUE)) / group.Sum(p => p.VOLUME));
Avg = Double.IsInfinity(x)? 0 : x;

Just replace
Avg = group.Sum(p => p.REVENUE) / group.Sum(p => p.VOLUME),
with
Avg = group.Sum(p => p.VOLUME) == 0
? 0
: group.Sum(p => p.REVENUE) / group.Sum(p => p.VOLUME),

Related

Generate a GeoJSON heatmap using MongoDB map/reduce

I am visualising annual UK film screening using a javascript map that takes GeoJSON as an input. You can see it working for 2011 data here: http://screened2011.herokuapp.com
The generation of the GeoJSON is very inefficient - often taking 5+ seconds per "tile".
I have a Ruby app which queries MongoDB for a set of "screenings" within a bounding box (as requested by the JS) and then generates a two-dimensional array representing the total number of screenings that occurred within each of a 16x16. This is apparently the bottleneck - it's hitting the server and pulling down all of these screenings.
I'd like to replace this with a map/reduce query that aggregates the count of all screenings within a bounding box into a 16x16 array of values, but I'm not having much success - it's quite the task for my first map/reduce!
Here's a simplified version of my code with unrelated stuff taken out (it's awful, and if this weren't a hack coming to an end, I'd refactor):
get :boxed, :map => "/screenings/year/:year/quadrant/:area/bbox/:bbox", :provides => [:json, :jsonp], :cache => false do
box = parameter_box(params[:bbox]) # returns [[minx,miny],[maxx,maxy]]
year = params[:year].to_i
screenings = Screening.where(:location.within(:box) => box).where(:year => year)
jsonp Screening.heatmap(screenings, box, 16)
end
def self.heatmap screenings, box, scale
grid = []
min_x = box[0][0]
min_y = box[0][1]
max_x = box[1][0]
max_y = box[1][1]
box_width = max_x.to_f - min_x.to_f
box_height = max_y.to_f - min_y.to_f
return [] if box_width == 0 || box_height == 0
# Set up an empty GeoJSON-style array to hold the results
scalef = scale.to_f
(0..(scale - 1)).each do |i|
grid[i] = []
(0..(scale - 1)).each do |j|
box_min_x = min_x + (i * ( box_width / scalef ))
box_max_x = min_x + ((i + 1) * ( box_width / scalef ))
box_min_y = min_y + (j * ( box_height / scalef ))
box_max_y = min_y + ((j + 1) * ( box_height / scalef ))
grid[i][j] = {
:count => 0,
#:id => "#{box_min_x}_#{box_max_x}_#{box_min_y}_#{box_max_y}",
:coordinates => [
[
[box_min_x,box_min_y], [box_min_x, box_max_y], [box_max_x, box_max_y], [box_max_x, box_min_y], [box_min_x,box_min_y]
]
]
}
end
end
# This loop is the bottleneck and I'd like to replace with a map-reduce
screenings.only(:location, :total_showings).each do |screening|
x = (scale * ((screening.location[0] - min_x) / box_width)).floor
y = (scale * ((screening.location[1] - min_y) / box_height)).floor
raise if x > (scale - 1)
raise if y > (scale - 1)
grid[x][y][:count] += screening.total_showings.to_i
end
# This converts the resulting 16x16 into GeoJSON
places = []
grid.each do |x|
x.each do |p|
if p[:count].to_i > 0
properties = {}
properties[:total_showings] = p[:count]
places << {
"id" => p[:id],
"type" => "Feature",
"geometry" => {
"type" => "Polygon",
"coordinates" => p[:coordinates]
},
"properties"=> properties
}
end
end
end
{
"type" => "FeatureCollection",
"features" => places
}
end
I'm using Mongoid, so I could chain a mapreduce onto the screenings query, and I'm hoping that this would greatly speed up the process - but how should I go about getting something like the following to pass into this function?:
[
[1,20000,30,3424,53,66,7586,54543,76764,4322,7664,43242,43,435,32,643],
...
]
...based on several million records each in this structure (essentially summing the total_showings) for each one within a bounding box:
{"_id"=>BSON::ObjectId('50e481e653e6dfbc92057e8d'),
"created_at"=>2013-01-02 18:52:22 +0000,
"ended_at"=>Thu, 07 Jun 2012 00:00:00 +0100,
"events"=>["10044735484"],
"film_id"=>BSON::ObjectId('4f96a91153e6df5ebc001afe'),
"genre_ids"=>[],
"location"=>[-2.003309596016, 52.396317185921],
"performance_id"=>"9001923080",
"specialised"=>false,
"started_at"=>Fri, 01 Jun 2012 00:00:00 +0100,
"total_showings"=>1,
"updated_at"=>2013-01-02 18:52:22 +0000,
"venue_id"=>BSON::ObjectId('4f9500bf53e6df004000034d'),
"week"=>nil,
"year"=>2012}
Thanks in advance folks!

LINQ Query to Find the Maximum Mean for a Time Span

I have a set of data that has two points; "watts" and a time stamp.
Each data point is separated by 1 second.
So it looks like this:
0:01 100
0:02 110
0:03 133
0:04 280
.....
The data set is a couple hours long.
I'd like to write a query where I can find the maximum average watts for different time periods (5 seconds, 1 minutes, 5 minutes, 20 minutes, ect).
I'd also like to know where in the data set that maximum average took place.
Edit
I think I need to do a query with a moving average and the appropriate bucket (let's say 10 seconds). Once I get that result, I query that to find the max.
Try this (I used Linqpad, C# statements):
var rnd = new Random();
// Create some data.
var tw = Enumerable.Range(0, 3600)
.Select(i => Tuple.Create(new TimeSpan(0, 0, i), rnd.Next(1000))).ToList();
// The query.
int secondsPerInterval = 10;
var averages =
tw.GroupBy(t => (int) (t.Item1.TotalSeconds/secondsPerInterval) + 1)
.Select(g => new
{
Seconds = g.Key * secondsPerInterval,
Avg = g.Average(t => t.Item2)
})
.ToList();
var max = averages.Where(tmp => tmp.Avg == averages.Max(tmp1 => tmp1.Avg));
max.Dump();
The trick is to group your timespans by the integral part of TotalSeconds divided by the required interval length.
You could do tw.AsParallel().GroupBy..., but you should benchmark if you loose more by parallellization overhead than you gain.
Okay, a guy at work helped me. Here's the answer in LINQ Pad.
var period = 10;
var rnd = new Random();
// Create some data.
var series = Enumerable.Range(0, 3600)
.Select(i => Tuple.Create(new TimeSpan(0, 0, i), rnd.Next(300))).ToList();
var item = Enumerable.Range(0, 3600).AsParallel()
.Select(i => series.Skip(i).Take(10))
.Select((e, i) => new { Average = e.Sum(x => x.Item2) / e.Count(), Second = i })
.OrderByDescending(a => a.Second).Dump();
item.First().Dump();
try this (untested):
for (int i = 0; i < = dataList.count ; i = i + (TimePeriod))
(from p in dataList.Skip(i).Take(TimePeriod) select p).Average(s => s.Watts)

Linq OrderByDescending but keep zero value first

I have a collection of integers that I want to order by descending value, with the exception of keeping a 0 value as first in the list.
For example:
0,1,2,3,4,5,6,7,8
Should result in:
0,8,7,6,5,4,3,2,1
Thanks!
var input = new [] {0,1,2,3,4,5,6,7,8};
Two sortings which work with both negative and positive numbers:
var result = input.OrderBy(i => i == 0? 0 : 1).ThenByDescending(i => i);
or this if all your numbers are non-negative:
var result = input.OrderByDescending(i => i == 0? int.MaxValue : i);
or some really weird solution if you have both negative and positive numbers but you don't want to sort twice (as I'm doing in first solution):
var result = input
.GroupBy(i => i == 0 ? 0 : 1)
.OrderBy(g => g.Key)
.Select(g => g.Key == 0 ? g : g.OrderByDescending(i => i)
.SelectMany(g => g);

How to calculate multiple averages in one query in linq to entities

How to do this in linq to entities in one query?
SELECT avg(Column1), avg(Column2), ... from MyTable
where ColumnX = 234
??
You could do something like that:
var averages = myTable
.Where(item => item.ColumnX == 234)
.Aggregate(
new { count = 0, sum1 = 0.0, sum2 = 0.0 },
(acc, item) => new { count = acc.count + 1, sum1 = acc.sum1 + item.Column1, sum2 = acc.sum2 + item.Column2 },
acc => new { avg1 = acc.sum1 / acc.count, avg2 = acc.sum2 / acc.count });
Note the call to AsEnumerable() to force Aggregate to be executed locally (as EF probably doesn't know how to convert it to SQL) Actually it seems to work ;)
Alternatively, you could use this query:
var averages =
from item in table
where item.ColumnX == 234
group item by 1 into g
select new
{
Average1 = g.Average(i => i.Column1),
Average2 = g.Average(i => i.Column2)
};
The use of group by here is not very intuitive, but it's probably easier to read than the other solution. Not sure it can be converted to SQL though...

LINQ: GroupBy with maximum count in each group

I have a list of duplicate numbers:
Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
// {1,1,1,2,2,2,3,3,3}
I group them and get quantity of occurance:
Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
.GroupBy(o => o).Select(o => new { Qty = o.Count(), Num = o.Key })
Qty Num
3 1
3 2
3 3
What I really need is to limit the quantity per group to some number. If the limit is 2 the result for the above grouping would be:
Qty Num
2 1
1 1
2 2
1 2
2 3
1 3
So, if Qty = 10 and limit is 4, the result is 3 rows (4, 4, 2). The Qty of each number is not equal like in example. The specified Qty limit is the same for whole list (doesn't differ based on number).
Thanks
Some of the other answers are making the LINQ query far more complex than it needs to be. Using a foreach loop is certainly faster and more efficient, but the LINQ alternative is still fairly straightforward.
var input = Enumerable.Range(1, 3).SelectMany(x => Enumerable.Repeat(x, 10));
int limit = 4;
var query =
input.GroupBy(x => x)
.SelectMany(g => g.Select((x, i) => new { Val = x, Grp = i / limit }))
.GroupBy(x => x, x => x.Val)
.Select(g => new { Qty = g.Count(), Num = g.Key.Val });
There was a similar question that came up recently asking how to do this in SQL - there's no really elegant solution and unless this is Linq to SQL or Entity Framework (i.e. being translated into a SQL query), I'd really suggest that you not try to solve this problem with Linq and instead write an iterative solution; it's going to be a great deal more efficient and easier to maintain.
That said, if you absolutely must use a set-based ("Linq") method, this is one way you could do it:
var grouped =
from n in nums
group n by n into g
select new { Num = g.Key, Qty = g.Count() };
int maxPerGroup = 2;
var portioned =
from x in grouped
from i in Enumerable.Range(1, grouped.Max(g => g.Qty))
where (x.Qty % maxPerGroup) == (i % maxPerGroup)
let tempQty = (x.Qty / maxPerGroup) == (i / maxPerGroup) ?
(x.Qty % maxPerGroup) : maxPerGroup
select new
{
Num = x.Num,
Qty = (tempQty > 0) ? tempQty : maxPerGroup
};
Compare with the simpler and faster iterative version:
foreach (var g in grouped)
{
int remaining = g.Qty;
while (remaining > 0)
{
int allotted = Math.Min(remaining, maxPerGroup);
yield return new MyGroup(g.Num, allotted);
remaining -= allotted;
}
}
Aaronaught's excellent answer doesn't cover the possibility of getting the best of both worlds... using an extension method to provide an iterative solution.
Untested:
public static IEnumerable<IEnumerable<U>> SplitByMax<T, U>(
this IEnumerable<T> source,
int max,
Func<T, int> maxSelector,
Func<T, int, U> resultSelector
)
{
foreach(T x in source)
{
int number = maxSelector(x);
List<U> result = new List<U>();
do
{
int allotted = Math.Min(number, max);
result.Add(resultSelector(x, allotted));
number -= allotted
} while (number > 0 && max > 0);
yield return result;
}
}
Called by:
var query = grouped.SplitByMax(
10,
o => o.Qty,
(o, i) => new {Num = o.Num, Qty = i}
)
.SelectMany(split => split);

Resources