How to group and count missing values using linq - linq

How do I go about counting missing values using linq. Basically I am counting occurrences within a particular month and I want the count to show as zero if there were no entries for that particular month.
However, currently if there are no entries for that month the array skips a month as shown at index 5 below. The reason I don't want this to happen is because I am plotting the results on a chart so the skipped dates are out of sync from index 5 onwards with the actual count.
Below is my linq query
var veterans = _db.Records
.Where(j => j.Requestor == "Veterans" && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) >= 0 && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) <= 24)
.GroupBy(g => new { g.Request_Date.Value.Year, g.Request_Date.Value.Month }).OrderBy(d => d.Key.Year).ThenBy(d => d.Key.Month)
.Select(group => new
{
Dates = group.Key,
Count = group.Count()
});
var veteransCount = veterans.Select(n => n.Count).ToArray();

You can create a helper function to generate the month+year enumeration you need:
public static IEnumerable<(int Year,int Month)> MonthsInYears(int fromYear, int fromMonth, int toYear, int toMonth) {
for (int year = fromYear; year <= toYear; ++year)
for (int month = (year == fromYear ? fromMonth : 1); month <= (year == toYear ? toMonth : 12); ++month)
yield return (year, month);
}
Then using this, you can create an enumeration of the period:
var veterans = _db.Records
.Where(j => j.Requestor == "Veterans" && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) >= 0 && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) <= 24)
.GroupBy(g => new { g.Request_Date.Value.Year, g.Request_Date.Value.Month }).OrderBy(d => d.Key.Year).ThenBy(d => d.Key.Month)
.Select(group => new {
YearMonth = group.Key,
Count = group.Count()
});
var minYearMonth = veterans.Select(v => v.YearMonth).First();
var maxYearMonth = veterans.Select(v => v.YearMonth).Last();
var monthsInYears = MonthsInYears(minYearMonth.Year, minYearMonth.Month, maxYearMonth.Year, maxYearMonth.Month);
Then you can GroupJoin (as a left join) to your database data:
var veteransCount = monthsInYears.GroupJoin(
veterans,
ym => new { ym.Year, ym.Month },
v => v.YearMonth,
(ym, sj) => sj.FirstOrDefault()?.Count ?? 0)
.ToArray();
Alternatively, since this is a specific case, you could create a Dictionary for your source data and lookup each enumeration value:
var veteransMap = veterans.ToDictionary(v => v.YearMonth, v => v.Count);
var veteransCount2 = monthsInYears.Select(ym => veteransMap.TryGetValue(new { ym.Year, ym.Month }, out var count) ? count : 0)
.ToArray();
NOTE: If you want the full beginning and ending years, you could just call the MonthsInYears method with 1 and 12 for the from and to months.

Related

Get Group sum not using group.Sum() in linq

The following query works, but I want to get the same result without using grp.Sum(). Can we do it?
from item in (await VehicleReplaceCostDataAsync())
group item by (item.type, item.size, item.ADA, item.eseq) into grp
orderby (grp.Key.eseq, grp.Key.size, grp.Key.ADA)
select new VehicleReplacementCost
{
type = grp.Key.type,
size = grp.Key.size,
ADA = grp.Key.ADA,
count = grp.Sum(x => x.count),
cost = grp.Sum(x => x.cost),
Fcount = grp.Sum(x => x.Fcount),
Fcost = grp.Sum(x => x.Fcost),
eseq = grp.Key.eseq,
}).ToList();
Perhaps by using .Aggregate()? [docs]
count = grp.Aggregate(0, (a, b) => a + b.count)
Thanks for the answer from Astrid. It looks like a good one, but I didn't test it. My colleague gave this solution instead by using yield:
var groups = costs
.GroupBy(type => (type.SystemId, type.Type, type.Size, type.ADA, type.Eseq))
.OrderBy(group => (group.Key.SystemId, group.Key.Eseq, group.Key.Size, group.Key.ADA));
foreach (var group in groups)
{
var result = new ProgramGuideVehicleCostRow
{
SystemId = group.Key.SystemId,
Type = group.Key.Type,
Size = group.Key.Size,
ADA = group.Key.ADA,
};
foreach (var row in group)
{
result.Cost += row.Cost;
result.Fcost += row.Fcost;
result.Count += row.Count;
result.Fcount += row.Fcount;
}
yield return result;
}

Date filtering in Linq

Hi I'm trying to filter my result with a date.
What I tried so far:
var lastYear = (DateTime.Now.Year) - 1;
var salesLastYear = _documentService.GetDocuments(
d => d.DocumentTypeId == saleDocumentId &&
d.EffectiveOnUtc.Contains(lastYear))
.Select(d => d.Id).ToList();
var salesLastYear = _documentService.GetDocuments(
d => d.DocumentTypeId == saleDocumentId &&
(d.EffectiveOnUtc.Year == lastYear))
.Select(d => d.Id).ToList();
Both gave no errors in visual studio, but did raise an exeption during execution.
Also tried to convert both the values to a string, but that also failed.
Working on the assumption that your EffectiveOnUtc is a DateTime, and you want to filter to records within the previous calendar year:
int lastYear = DateTime.Now.Year - 1;
DateTime minDate = new DateTime(lastYear, 1, 1);
DateTime maxDate = minDate.AddYears(1);
var salesLastYear = _documentService.GetDocuments(
d => d.DocumentTypeId == saleDocumentId
&& d.EffectiveOnUtc >= minDate
&& d.EffectiveOnUtc < maxDate)
Select(d => d.Id).ToList();

LINQ and 2 datatables

I have 2 datatables in a dataset. One table has a list called CostTypes. Just an Id and Description field.
The other datatable is the master table and has many records and one of the columns is the cost type. There will be cost types that are not reference in this datatable. There is another column in this databale called cost.
What I am trying to do is get a summary by cost type with a total of the cost. But I want ALL cost types listed any values not in the master table will be zero.
CostType table
Id, Description
1,Marketing
2,Sales
3,Production
4,Service
Master table
Id, Cost, CostTypeId
1,10,1
2,120,1
3,40,3
So I would like to see a result in a datable (if possible) so I can bind to datagridview
Marketing 130
Sales 0
Production 40
Service 0
Thanks for the help everyone, this is what I came up from the answers - Can anyone suggest any improvements???
Also how can I convert the result in query1 into a datable???
var query1 =
from rowCT in costTypes.AsEnumerable()
from rowSTD in stdRates.AsEnumerable()
.Where( d => d.Field<int?>( "CostTypeId" ) == rowCT.Field<int?>( "CostTypeId" ) )
.DefaultIfEmpty()
group new { row0 = rowCT, row1 = rowSTD }
by rowCT.Field<string>( "Description" ) into g
select new
{
g.Key,
Cost = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "Cost" ) ),
TotalCost = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "TotalCost" ) ),
TotalHours = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "TotalHours" ) ),
TotalLabourCost = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "TotalLabourCost" ) )
}
;
Maybe something like this:
Test data:
DataTable dt=new DataTable();
dt.Columns.Add("Id",typeof(int));
dt.Columns.Add("Description",typeof(string));
dt.Rows.Add(1,"Marketing");
dt.Rows.Add(2,"Sales");
dt.Rows.Add(3,"Production");
dt.Rows.Add(4,"Service");
DataTable dt2=new DataTable();
dt2.Columns.Add("Id",typeof(int));
dt2.Columns.Add("Cost",typeof(int));
dt2.Columns.Add("CostTypeId",typeof(int));
dt2.Rows.Add(1,10,1);
dt2.Rows.Add(2,120,1);
dt2.Rows.Add(3,40,1);
Linq query
var query=(
from row in dt.AsEnumerable()
from row1 in dt2.AsEnumerable()
.Where (d =>d.Field<int>("Id")==row.Field<int>("Id") )
.DefaultIfEmpty()
group new{row,row1}
by row.Field<string>("Description") into g
select new
{
g.Key,
Cost=g.Sum (x =>x.row1==null?0:x.row1.Field<int>("Cost"))
}
);
Result
Key Cost
Marketing 10
Sales 120
Production 40
Service 0
You can use the Sum extension method to compute the cost. It will return 0 if the collection is empty which is exactly what you want:
var costTypes = new DataTable("CostTypes");
costTypes.Columns.Add("Id", typeof(Int32));
costTypes.Columns.Add("Description", typeof(String));
costTypes.Rows.Add(1, "Marketing");
costTypes.Rows.Add(2, "Sales");
costTypes.Rows.Add(3, "Production");
costTypes.Rows.Add(4, "Service");
var costEntries = new DataTable("CostEntries");
costEntries.Columns.Add("Id", typeof(Int32));
costEntries.Columns.Add("Cost", typeof(Int32));
costEntries.Columns.Add("CostTypeId", typeof(Int32));
costEntries.Rows.Add(1, 10, 1);
costEntries.Rows.Add(2, 120, 1);
costEntries.Rows.Add(3, 40, 3);
var costs = costTypes
.Rows
.Cast<DataRow>()
.Select(
dr => new {
Id = dr.Field<Int32>("Id"),
Description = dr.Field<String>("Description")
}
)
.Select(
ct => new {
ct.Description,
TotalCost = costEntries
.Rows
.Cast<DataRow>()
.Where(ce => ce.Field<Int32>("CostTypeId") == ct.Id)
.Sum(ce => ce.Field<Int32>("Cost"))
}
);
The result is:
Description|TotalCost
-----------+---------
Marketing | 130
Sales | 0
Production | 40
Service | 0
You can create a new DataSet quite simply:
var costsDataTable = new DataTable("Costs");
costsDataTable.Columns.Add("Description", typeof(String));
costsDataTable.Columns.Add("TotalCost", typeof(Int32));
foreach (var cost in costs)
costsDataTable.Rows.Add(cost.Description, cost.TotalCost);
If the linear search performed by the Where in the code above is a concern you can improve the performance by creating a lookup table in advance:
var costEntriesLookup = costEntries
.Rows
.Cast<DataRow>()
.Select(
ce => new {
Cost = ce.Field<Int32>("Cost"),
CostTypeId = ce.Field<Int32>("CostTypeId")
}
)
.ToLookup(ce => ce.CostTypeId, ce => ce.Cost);
var costs = costTypes
.Rows
.Cast<DataRow>()
.Select(
dr => new {
Id = dr.Field<Int32>("Id"),
Description = dr.Field<String>("Description")
}
)
.Select(
ct => new {
ct.Description,
TotalCost = costEntriesLookup.Contains(ct.Id)
? costEntriesLookup[ct.Id].Sum()
: 0
}
);
I came up with a simpler bit of linq than others seemed to use. Thanks to Martin Liversage for the code to create the input data.
var costTypes = new DataTable("CostTypes");
costTypes.Columns.Add("Id", typeof(Int32));
costTypes.Columns.Add("Description", typeof(String));
costTypes.Rows.Add(1, "Marketing");
costTypes.Rows.Add(2, "Sales");
costTypes.Rows.Add(3, "Production");
costTypes.Rows.Add(4, "Service");
var costEntries = new DataTable("CostEntries");
costEntries.Columns.Add("Id", typeof(Int32));
costEntries.Columns.Add("Cost", typeof(Int32));
costEntries.Columns.Add("CostTypeId", typeof(Int32));
costEntries.Rows.Add(1, 10, 1);
costEntries.Rows.Add(2, 120, 1);
costEntries.Rows.Add(3, 40, 3);
var cte = costTypes.Rows.Cast<DataRow>();
var cee = costEntries.Rows.Cast<DataRow>();
var output = cte.Select(
ct => new {
Description = ct["Description"],
Sum = cee.Where(ce=>ce["CostTypeId"].Equals(ct["Id"])).Sum(ce=>(int)ce["Cost"])
}
);
This may lose efficiency on larger tables since each cost type will search the cost entry table whereas using grouping I suspect you only need one pass over the table. Personally I'd prefer the (to my mind) simpler looking code. It will depend on your use case though.

Error trying to exclude records with a JOIN to another object

In my code below, is there any way I can use the results in the object 'WasteRecordsExcluded' to join with searchResults, essentially excluding the WasteId's I don't want.
If I debug to the last line I get the error :
base {System.SystemException} = {"The query contains references to items defined on a different data context."}
Or if joining is impossible then i could change bHazardous from TRUE to FALSE and FALSE to TRUE and do some kind of 'NOT IN' comparison.
Going bananas with this one, anyone help? Kind Regards :
var allWaste = _securityRepository.FindAllWaste(userId, SystemType.W);
var allWasteIndicatorItems = _securityRepository.FindAllWasteIndicatorItems();
// First get all WASTE RECORDS
var searchResults = (from s in allWaste
join x in allWasteIndicatorItems on s.WasteId equals x.WasteId
where (s.Description.Contains(searchText)
&& s.Site.SiteDescription.EndsWith(searchTextSite)
&& (s.CollectedDate >= startDate && s.CollectedDate <= endDate))
&& x.EWC.EndsWith(searchTextEWC)
select s).Distinct();
var results = searchResults;
if (hazardous != "-1")
{
// User has requested to filter on Hazardous or Non Hazardous only rather than Show All
var WasteRecordsExcluded = (from we in _db.WasteIndicatorItems
.Join(_db.WasteIndicators, wii => wii.WasteIndicatorId, wi => wi.WasteIndicatorId, (wii, wi) => new { wasteid = wii.WasteId, wasteindicatorid = wii.WasteIndicatorId, hazardtypeid = wi.HazardTypeId })
.Join(_db.HazardTypes, w => w.hazardtypeid, h => h.HazardTypeId, (w, h) => new { wasteid = w.wasteid, hazardous = h.Hazardous })
.GroupBy(g => new { g.wasteid, g.hazardous })
.Where(g => g.Key.hazardous == bHazardous && g.Count() >= 1)
select we);
// Now join the 2 object to eliminate all the keys that do not apply
results = results.Where(n => WasteRecordsExcluded.All(t2 => n.WasteId == t2.Key.wasteid));
}
return results;
Maybe something like this:
.....
var results = searchResults.ToList();
.....
.....
.Where(g => g.Key.hazardous == bHazardous && g.Count() >= 1)
select we).ToList();
.....

Something wrong with my LINQ to Entities query

I currently have a LINQ query:
public List<EventSchool> GetEventSchools(int eventID)
{
var eventSchools = db.EventSchools
.Include("Organisation")
.Where(e => e.EventID == eventID)
.ToList();
foreach (var ev in eventSchools)
{
if (db.EventSchoolKeyStages.Where(e => e.EventSchoolID == ev.EventSchoolID).Count() > 0)
{
int ks = db.EventSchoolKeyStages
.Where(e => ev.EventSchoolID == ev.EventSchoolID)
.Sum(e => e.Males + e.Females);
ev.StudentNumbers = ks;
}
}
return eventSchools;
}
When I inspect EventSchools, the student numbers for ALL items in the list shows as the first total.
For example, if I have 3 items in the list:
Item 1 - Males = 10, Females = 10
Item 2 - Males = 1, Females = 2
Item 3 - Males = 200, Females = 500
ALL items have a StudentNumbers of 20, rather than:
Item 1 - 20
Item 2 - 3
Item 3 - 700
Not sure what I'm doing wrong?
You have a typo here:
.Where(e => ev.EventSchoolID == ev.EventSchoolID)
This lambda will always be true. I suspect you meant
.Where(e => e.EventSchoolID == ev.EventSchoolID)
^^^
which is different in the indicated place.
You have an error in your query:
.Where(e => ev.EventSchoolID == ev.EventSchoolID)
Should be:
.Where(e => e.EventSchoolID == ev.EventSchoolID)

Resources