Get Group sum not using group.Sum() in linq - linq

The following query works, but I want to get the same result without using grp.Sum(). Can we do it?
from item in (await VehicleReplaceCostDataAsync())
group item by (item.type, item.size, item.ADA, item.eseq) into grp
orderby (grp.Key.eseq, grp.Key.size, grp.Key.ADA)
select new VehicleReplacementCost
{
type = grp.Key.type,
size = grp.Key.size,
ADA = grp.Key.ADA,
count = grp.Sum(x => x.count),
cost = grp.Sum(x => x.cost),
Fcount = grp.Sum(x => x.Fcount),
Fcost = grp.Sum(x => x.Fcost),
eseq = grp.Key.eseq,
}).ToList();

Perhaps by using .Aggregate()? [docs]
count = grp.Aggregate(0, (a, b) => a + b.count)

Thanks for the answer from Astrid. It looks like a good one, but I didn't test it. My colleague gave this solution instead by using yield:
var groups = costs
.GroupBy(type => (type.SystemId, type.Type, type.Size, type.ADA, type.Eseq))
.OrderBy(group => (group.Key.SystemId, group.Key.Eseq, group.Key.Size, group.Key.ADA));
foreach (var group in groups)
{
var result = new ProgramGuideVehicleCostRow
{
SystemId = group.Key.SystemId,
Type = group.Key.Type,
Size = group.Key.Size,
ADA = group.Key.ADA,
};
foreach (var row in group)
{
result.Cost += row.Cost;
result.Fcost += row.Fcost;
result.Count += row.Count;
result.Fcount += row.Fcount;
}
yield return result;
}

Related

How to group and count missing values using linq

How do I go about counting missing values using linq. Basically I am counting occurrences within a particular month and I want the count to show as zero if there were no entries for that particular month.
However, currently if there are no entries for that month the array skips a month as shown at index 5 below. The reason I don't want this to happen is because I am plotting the results on a chart so the skipped dates are out of sync from index 5 onwards with the actual count.
Below is my linq query
var veterans = _db.Records
.Where(j => j.Requestor == "Veterans" && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) >= 0 && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) <= 24)
.GroupBy(g => new { g.Request_Date.Value.Year, g.Request_Date.Value.Month }).OrderBy(d => d.Key.Year).ThenBy(d => d.Key.Month)
.Select(group => new
{
Dates = group.Key,
Count = group.Count()
});
var veteransCount = veterans.Select(n => n.Count).ToArray();
You can create a helper function to generate the month+year enumeration you need:
public static IEnumerable<(int Year,int Month)> MonthsInYears(int fromYear, int fromMonth, int toYear, int toMonth) {
for (int year = fromYear; year <= toYear; ++year)
for (int month = (year == fromYear ? fromMonth : 1); month <= (year == toYear ? toMonth : 12); ++month)
yield return (year, month);
}
Then using this, you can create an enumeration of the period:
var veterans = _db.Records
.Where(j => j.Requestor == "Veterans" && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) >= 0 && EF.Functions.DateDiffMonth(j.Request_Date, DateTime.Now) <= 24)
.GroupBy(g => new { g.Request_Date.Value.Year, g.Request_Date.Value.Month }).OrderBy(d => d.Key.Year).ThenBy(d => d.Key.Month)
.Select(group => new {
YearMonth = group.Key,
Count = group.Count()
});
var minYearMonth = veterans.Select(v => v.YearMonth).First();
var maxYearMonth = veterans.Select(v => v.YearMonth).Last();
var monthsInYears = MonthsInYears(minYearMonth.Year, minYearMonth.Month, maxYearMonth.Year, maxYearMonth.Month);
Then you can GroupJoin (as a left join) to your database data:
var veteransCount = monthsInYears.GroupJoin(
veterans,
ym => new { ym.Year, ym.Month },
v => v.YearMonth,
(ym, sj) => sj.FirstOrDefault()?.Count ?? 0)
.ToArray();
Alternatively, since this is a specific case, you could create a Dictionary for your source data and lookup each enumeration value:
var veteransMap = veterans.ToDictionary(v => v.YearMonth, v => v.Count);
var veteransCount2 = monthsInYears.Select(ym => veteransMap.TryGetValue(new { ym.Year, ym.Month }, out var count) ? count : 0)
.ToArray();
NOTE: If you want the full beginning and ending years, you could just call the MonthsInYears method with 1 and 12 for the from and to months.

Linq query averages

I can run this in Linqpad and it works fine but in VS when i run it the result throws errors because of the Avg, Max and Min statements. Can anyone advise how i need to change this to get the desired result?
tickets = from t in dbContext.TblOmTasks
join o in dbContext.TblOms on t.OMID equals o.OMID
join ls in dbContext.LkpStatusBasics on t.OMTaskStatus equals ls.ID
where t.OMID == SiteId
where ls.Status.Contains(status)
group t by new { Y = t.Created.Value.Date.Year, M = t.Created.Value.Date.Month } into grp
orderby grp.Key.M
select new TBS
{
Month = new DateTime(grp.Key.Y, grp.Key.M, 1).ToString("MMM", CultureInfo.InvariantCulture)
,Avg = grp.Average(g => Convert.ToInt32((g.Updated.HasValue ? g.Updated - g.Created : DateTime.Now - g.Created).Value.Days))
,Max = grp.Max(g => Convert.ToInt32((g.Updated.HasValue ? g.Updated - g.Created : DateTime.Now - g.Created).Value.Days))
,Min = grp.Min(g => Convert.ToInt32((g.Updated.HasValue ? g.Updated - g.Created : DateTime.Now - g.Created).Value.Days))
};
Looks like EF Core still can not translate Timestamp.Days. So use appropriate functions.
tickets =
from t in dbContext.TblOmTasks
join o in dbContext.TblOms on t.OMID equals o.OMID
join ls in dbContext.LkpStatusBasics on t.OMTaskStatus equals ls.ID
where t.OMID == SiteId
where ls.Status.Contains(status)
group t by new { Y = t.Created.Value.Date.Year, M = t.Created.Value.Date.Month } into grp
orderby grp.Key.M
select new TBS
{
Month = new DateTime(grp.Key.Y, grp.Key.M, 1).ToString("MMM", CultureInfo.InvariantCulture),
Avg = grp.Average(g => EF.Functions.DateDiffDay(g.Created, g.Updated ?? DateTime.Now),
Max = grp.Max(g => EF.Functions.DateDiffDay(g.Created, g.Updated ?? DateTime.Now),
Min = grp.Min(g => EF.Functions.DateDiffDay(g.Created, g.Updated ?? DateTime.Now)
};

How do I optimize this LINQ query? It runs localhost but it doesn't run on Azure

I have this LINQ query and am getting results I need. However it takes 5-6 seconds to show results on localhost, and I can't even run this on Azure.
I'm new to LINQ, and I'm sure that I'm doing something inefficient.
Could someone direct me to optimize?
var joblist = (from t in db.Tracking
group t by t.JobNumber into j
let id = j.Max(x => x.ScanDate)
select new
{
jn = j.Key,
ti = j.FirstOrDefault(y => y.ScanDate == id).TrackingId,
sd = j.FirstOrDefault(y => y.ScanDate == id).ScanDate,
lc = j.FirstOrDefault(y => y.ScanDate == id).LocationId
}).Where(z => z.lc == lid).Where(z => z.jn != null);
jfilter = (from tr in joblist
join lc in db.Location on tr.lc equals lc.LocationId
join lt in db.LocType on lc.LocationType equals lt.LocationType
select new ScanMod
{
TrackingId = tr.ti,
LocationName = lc.LocationName,
JobNumber = tr.jn,
LocationTypeName = lt.LocationTypeName,
ScanDate = tr.sd,
StoneId = ""
}).OrderByDescending(z => z.ScanDate);
UPDATE:
This query runs on Azure(s1) but it takes 30 seconds. This table has 500,000 rows and I assume that OrderByDescending or FirstOrDefault is killing it...
var joblist = db.Tracking
.GroupBy(j => j.JobNumber)
.Select(g => g.OrderByDescending(j => j.ScanDate).FirstOrDefault());
jfilter = (from tr in joblist
join lc in db.Location on tr.LocationId equals lc.LocationId
join lt in db.LocType on lc.LocationType equals lt.LocationType
where tr.LocationId == lid
select new ScanMod
{
TrackingId = tr.TrackingId,
LocationName = lc.LocationName,
JobNumber = tr.JobNumber,
LocationTypeName = lt.LocationTypeName,
ScanDate = tr.ScanDate,
StoneId = ""
}).OrderByDescending(z => z.ScanDate);

LINQ and 2 datatables

I have 2 datatables in a dataset. One table has a list called CostTypes. Just an Id and Description field.
The other datatable is the master table and has many records and one of the columns is the cost type. There will be cost types that are not reference in this datatable. There is another column in this databale called cost.
What I am trying to do is get a summary by cost type with a total of the cost. But I want ALL cost types listed any values not in the master table will be zero.
CostType table
Id, Description
1,Marketing
2,Sales
3,Production
4,Service
Master table
Id, Cost, CostTypeId
1,10,1
2,120,1
3,40,3
So I would like to see a result in a datable (if possible) so I can bind to datagridview
Marketing 130
Sales 0
Production 40
Service 0
Thanks for the help everyone, this is what I came up from the answers - Can anyone suggest any improvements???
Also how can I convert the result in query1 into a datable???
var query1 =
from rowCT in costTypes.AsEnumerable()
from rowSTD in stdRates.AsEnumerable()
.Where( d => d.Field<int?>( "CostTypeId" ) == rowCT.Field<int?>( "CostTypeId" ) )
.DefaultIfEmpty()
group new { row0 = rowCT, row1 = rowSTD }
by rowCT.Field<string>( "Description" ) into g
select new
{
g.Key,
Cost = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "Cost" ) ),
TotalCost = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "TotalCost" ) ),
TotalHours = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "TotalHours" ) ),
TotalLabourCost = g.Sum( x => x.row1 == null ? 0 : x.row1.Field<decimal>( "TotalLabourCost" ) )
}
;
Maybe something like this:
Test data:
DataTable dt=new DataTable();
dt.Columns.Add("Id",typeof(int));
dt.Columns.Add("Description",typeof(string));
dt.Rows.Add(1,"Marketing");
dt.Rows.Add(2,"Sales");
dt.Rows.Add(3,"Production");
dt.Rows.Add(4,"Service");
DataTable dt2=new DataTable();
dt2.Columns.Add("Id",typeof(int));
dt2.Columns.Add("Cost",typeof(int));
dt2.Columns.Add("CostTypeId",typeof(int));
dt2.Rows.Add(1,10,1);
dt2.Rows.Add(2,120,1);
dt2.Rows.Add(3,40,1);
Linq query
var query=(
from row in dt.AsEnumerable()
from row1 in dt2.AsEnumerable()
.Where (d =>d.Field<int>("Id")==row.Field<int>("Id") )
.DefaultIfEmpty()
group new{row,row1}
by row.Field<string>("Description") into g
select new
{
g.Key,
Cost=g.Sum (x =>x.row1==null?0:x.row1.Field<int>("Cost"))
}
);
Result
Key Cost
Marketing 10
Sales 120
Production 40
Service 0
You can use the Sum extension method to compute the cost. It will return 0 if the collection is empty which is exactly what you want:
var costTypes = new DataTable("CostTypes");
costTypes.Columns.Add("Id", typeof(Int32));
costTypes.Columns.Add("Description", typeof(String));
costTypes.Rows.Add(1, "Marketing");
costTypes.Rows.Add(2, "Sales");
costTypes.Rows.Add(3, "Production");
costTypes.Rows.Add(4, "Service");
var costEntries = new DataTable("CostEntries");
costEntries.Columns.Add("Id", typeof(Int32));
costEntries.Columns.Add("Cost", typeof(Int32));
costEntries.Columns.Add("CostTypeId", typeof(Int32));
costEntries.Rows.Add(1, 10, 1);
costEntries.Rows.Add(2, 120, 1);
costEntries.Rows.Add(3, 40, 3);
var costs = costTypes
.Rows
.Cast<DataRow>()
.Select(
dr => new {
Id = dr.Field<Int32>("Id"),
Description = dr.Field<String>("Description")
}
)
.Select(
ct => new {
ct.Description,
TotalCost = costEntries
.Rows
.Cast<DataRow>()
.Where(ce => ce.Field<Int32>("CostTypeId") == ct.Id)
.Sum(ce => ce.Field<Int32>("Cost"))
}
);
The result is:
Description|TotalCost
-----------+---------
Marketing | 130
Sales | 0
Production | 40
Service | 0
You can create a new DataSet quite simply:
var costsDataTable = new DataTable("Costs");
costsDataTable.Columns.Add("Description", typeof(String));
costsDataTable.Columns.Add("TotalCost", typeof(Int32));
foreach (var cost in costs)
costsDataTable.Rows.Add(cost.Description, cost.TotalCost);
If the linear search performed by the Where in the code above is a concern you can improve the performance by creating a lookup table in advance:
var costEntriesLookup = costEntries
.Rows
.Cast<DataRow>()
.Select(
ce => new {
Cost = ce.Field<Int32>("Cost"),
CostTypeId = ce.Field<Int32>("CostTypeId")
}
)
.ToLookup(ce => ce.CostTypeId, ce => ce.Cost);
var costs = costTypes
.Rows
.Cast<DataRow>()
.Select(
dr => new {
Id = dr.Field<Int32>("Id"),
Description = dr.Field<String>("Description")
}
)
.Select(
ct => new {
ct.Description,
TotalCost = costEntriesLookup.Contains(ct.Id)
? costEntriesLookup[ct.Id].Sum()
: 0
}
);
I came up with a simpler bit of linq than others seemed to use. Thanks to Martin Liversage for the code to create the input data.
var costTypes = new DataTable("CostTypes");
costTypes.Columns.Add("Id", typeof(Int32));
costTypes.Columns.Add("Description", typeof(String));
costTypes.Rows.Add(1, "Marketing");
costTypes.Rows.Add(2, "Sales");
costTypes.Rows.Add(3, "Production");
costTypes.Rows.Add(4, "Service");
var costEntries = new DataTable("CostEntries");
costEntries.Columns.Add("Id", typeof(Int32));
costEntries.Columns.Add("Cost", typeof(Int32));
costEntries.Columns.Add("CostTypeId", typeof(Int32));
costEntries.Rows.Add(1, 10, 1);
costEntries.Rows.Add(2, 120, 1);
costEntries.Rows.Add(3, 40, 3);
var cte = costTypes.Rows.Cast<DataRow>();
var cee = costEntries.Rows.Cast<DataRow>();
var output = cte.Select(
ct => new {
Description = ct["Description"],
Sum = cee.Where(ce=>ce["CostTypeId"].Equals(ct["Id"])).Sum(ce=>(int)ce["Cost"])
}
);
This may lose efficiency on larger tables since each cost type will search the cost entry table whereas using grouping I suspect you only need one pass over the table. Personally I'd prefer the (to my mind) simpler looking code. It will depend on your use case though.

Error trying to exclude records with a JOIN to another object

In my code below, is there any way I can use the results in the object 'WasteRecordsExcluded' to join with searchResults, essentially excluding the WasteId's I don't want.
If I debug to the last line I get the error :
base {System.SystemException} = {"The query contains references to items defined on a different data context."}
Or if joining is impossible then i could change bHazardous from TRUE to FALSE and FALSE to TRUE and do some kind of 'NOT IN' comparison.
Going bananas with this one, anyone help? Kind Regards :
var allWaste = _securityRepository.FindAllWaste(userId, SystemType.W);
var allWasteIndicatorItems = _securityRepository.FindAllWasteIndicatorItems();
// First get all WASTE RECORDS
var searchResults = (from s in allWaste
join x in allWasteIndicatorItems on s.WasteId equals x.WasteId
where (s.Description.Contains(searchText)
&& s.Site.SiteDescription.EndsWith(searchTextSite)
&& (s.CollectedDate >= startDate && s.CollectedDate <= endDate))
&& x.EWC.EndsWith(searchTextEWC)
select s).Distinct();
var results = searchResults;
if (hazardous != "-1")
{
// User has requested to filter on Hazardous or Non Hazardous only rather than Show All
var WasteRecordsExcluded = (from we in _db.WasteIndicatorItems
.Join(_db.WasteIndicators, wii => wii.WasteIndicatorId, wi => wi.WasteIndicatorId, (wii, wi) => new { wasteid = wii.WasteId, wasteindicatorid = wii.WasteIndicatorId, hazardtypeid = wi.HazardTypeId })
.Join(_db.HazardTypes, w => w.hazardtypeid, h => h.HazardTypeId, (w, h) => new { wasteid = w.wasteid, hazardous = h.Hazardous })
.GroupBy(g => new { g.wasteid, g.hazardous })
.Where(g => g.Key.hazardous == bHazardous && g.Count() >= 1)
select we);
// Now join the 2 object to eliminate all the keys that do not apply
results = results.Where(n => WasteRecordsExcluded.All(t2 => n.WasteId == t2.Key.wasteid));
}
return results;
Maybe something like this:
.....
var results = searchResults.ToList();
.....
.....
.Where(g => g.Key.hazardous == bHazardous && g.Count() >= 1)
select we).ToList();
.....

Resources