Using Linq to determine the missing combination of records - linq

Let me start to explain with an example.
var vProducts = new[] {
new { Product = "A", Location ="Y", Month = "January", Demand = 50 },
new { Product = "A", Location ="Y", Month = "February", Demand = 100 },
new { Product = "A", Location ="Y", Month = "March", Demand = 20 },
new { Product = "A", Location ="Y", Month = "June", Demand = 10 }
};
var vPeriods = new[] {
new { Priority = 1, Month = "January" },
new { Priority = 2, Month = "February" },
new { Priority = 3, Month = "March" },
new { Priority = 4, Month = "April" },
new { Priority = 5, Month = "May" },
new { Priority = 6, Month = "June" }
};
var vAll = from p in vProducts
from t in vPeriods
select new
{
Product = p.Product,
Location = p.Location,
Period = t.Priority,
PeriodName = t.Month,
Demand = p.Demand
};
This above query will create all combinations of Products & Period. But, I need to get a list of all products along with the ones that do not have matching Month as shown below.
example
Product Location Priority Month Demand
A Y 1 January 50
A Y 2 February 100
A Y 3 March 20
A Y 4 April null
A Y 5 May null
A Y 6 June 10
Thanks for any comments.

You want to do a left outer join, it will go something like:
var res = from period in vPeriods
join product in vProducts on period.Month equals product.Month into bla
from p in bla.DefaultIfEmpty()
select new { period.Month, period.Priority, Product = p == null ? null : p.Product, Demand = p == null ? -1 : p.Demand };
foreach (var a in res)
{
Console.WriteLine(string.Format("{0} {1} {2}", a.Product, a.Month, a.Demand));
}
Of course that products that do not have matching months don't have locations etc. (as you stated in your example)

Related

Group by with maximum

I want to group by category, show it's name, then show the highest id that is related to it. Here's some data and the result that I want further down. Any ideas? I've been playing around with GroupJoin but can't seem to get it to work.
My Data
var stuff = new[] {
new {id = 5, catId = 2},
new {id = 56, catId = 2},
new {id = 56, catId = 2},
new {id = 8, catId = 1},
new {id = 9, catId = 3}};
var categories = new[] {
new {catId = 1, Name = "Water"},
new {catId = 4, Name = "Wind"},
new {catId = 2, Name = "Fire"}};
What I want my results to look like
Water - 8
Wind - null
Fire - 56
categories
.GroupJoin
(
stuff,
c=>c.catId,
s=>s.catId,
(c,s)=>new
{
c.Name,
Max = s.Any() ? (int?)s.Max (m => m.id) : null
}
);
It seems that you want a "LEFT OUTER JOIN" with LINQ:
var query = from cat in categories
join s in stuff
on cat.catId equals s.catId into gj
from stuffJoin in gj.DefaultIfEmpty()
group stuffJoin by new { cat.catId, cat.Name } into catGroup
select new {
Category = catGroup.Key.Name,
MaxID = catGroup.Max(s => s == null ? 0 : s.id) // stuff is null for Wind
};
foreach (var x in query)
Console.WriteLine("Category: {0} Max-ID: {1}", x.Category, x.MaxID);
Outputs:
Category: Water Max-ID: 8
Category: Wind Max-ID: 0
Category: Fire Max-ID: 56

Put value in datatable group by week

I have a datatable which comprises values from 1st January to march, something like this:
DATE Employer Job1 Job2
1/4/2013 A 1.3 2
1/4/2013 B 2.5 6
1/6/2013 C 3.7 2.4
1/7/2013 D 11
1/7/2013 F 334 0
1/8/2013 A 1.87 1
1/8/2013 B 6.85 2
1/9/2013 C 58 226
1/16/2013 A 9.43 1.45
1/16/2013 B 5.27 0.6
1/122/2013 C 45.4 5
1/23/2013 A 44 4.78
1/29/2013 B 45 40
2/2/2013 C 45 54.12
2/2/2013 D 7 4.4587
2/3/2013 F 265 11.486
Update:
DataTable datatable = new DataTable("Employee");
datatable.Columns.Add("Date", typeof(string));
datatable.Columns.Add("Employee", typeof(string));
datatable.Columns.Add("Job1", typeof(double));
datatable.Columns.Add("Job2", typeof(double));
datatable.Rows.Add(new Object[] { "1/4/2013", "A", 1.3, 2 });
datatable.Rows.Add(new Object[] { "1/4/2013", "B", 2.5, 6 });
datatable.Rows.Add(new Object[] { "1/6/2013", "C", 3.7, 2.4 });
datatable.Rows.Add(new Object[] { "1/7/2013", "D", 11, 0.0 });
datatable.Rows.Add(new Object[] { "1/7/2013", "F", 334, 0 });
datatable.Rows.Add(new Object[] { "1/8/2013", "A", 1.87, 1 });
datatable.Rows.Add(new Object[] { "1/8/2013", "B", 6.85, 2 });
datatable.Rows.Add(new Object[] { "1/9/2013", "C", 58, 226 });
datatable.Rows.Add(new Object[] { "1/16/2013", "A", 9.43, 1.45 });
datatable.Rows.Add(new Object[] { "1/16/2013", "B", 5.27, 0.6 });
datatable.Rows.Add(new Object[] { "1/22/2013", "C", 45.4, 5 });
datatable.Rows.Add(new Object[] { "1/23/2013", "A", 44, 4.78 });
datatable.Rows.Add(new Object[] { "1/29/2013", "B", 45, 40 });
datatable.Rows.Add(new Object[] { "2/2/2013", "C", 45, 54.12 });
datatable.Rows.Add(new Object[] { "2/2/2013", "D", 7, 4.4587 });
datatable.Rows.Add(new Object[] { "2/3/2013", "F", "265", 11.486 });
datatable.Rows.Add(new Object[] { "3/3/2013", "A", "25", 28.124 });
I want to sum the values for job1 week wise where week starts from Monday to Sunday. This is the code I have written so far.
DateTime minDate = datatable.AsEnumerable()
.Min(r => DateTime.Parse(r.Field<string>("DATE")));
DateTime startDate = minDate.Date.Date.AddDays(+((6 + minDate.DayOfWeek
- DayOfWeek.Monday) % 7));
DateTime nextDate = startDate.AddDays(6);
DateTime maxDate = datatable.AsEnumerable()
.Max(r => DateTime.Parse(r.Field<string>("DATE")));
while (nextDate < maxDate)
{
var weekEmpGroups = datatable.AsEnumerable()
.Select(r => new
{
Row = r,
Employee = r.Field<String>("Employee"),
Date = DateTime.Parse(r.Field<string>("DATE"))
// week = minDate.Date.Date.AddDays(+((6 + minDate.DayOfWeek
- DayOfWeek.Monday) % 7))
})
.GroupBy(x => x.Employee);
DataTable dtWeeklyResults = new DataTable();
dtWeeklyResults.Columns.Add("Employee", typeof(string));
var dtf = System.Globalization.CultureInfo.CurrentCulture.DateTimeFormat;
double weekCount = 0.0;
string expression;
DataRow[] foundRows;
foreach (var empGroup in weekEmpGroups)
{
string employee = empGroup.Key;
var newRow = dtWeeklyResults.Rows.Add();
newRow["Employee"] = employee;
expression = "Employee=" + employee + " AND Date Between " + startDate
+ " And " + nextDate;
foundRows = datatable.Select(expression);
if (foundRows.Length > 0)
{
// add values using linq
}
}
Please suggest if this is correct way to do this and also how to add all values week wise? The result should look like this for Job1:
Employee 1/7-1/13 1/14-1/20 1/21-1/27 1/28-2/3 and so on...
A sum of values for this 7 days
B
C
D
Can anybody suggest how to achieve this by LINQ?
Helper methods
private static string GetColumnName(int weekNumber)
{
DateTime jan1 = new DateTime(2013, 1, 1);
int daysOffset = DayOfWeek.Monday - jan1.DayOfWeek;
DateTime firstMonday = jan1.AddDays(daysOffset);
var cal = ci.Calendar;
int firstWeek = cal.GetWeekOfYear(firstMonday, ci.DateTimeFormat.CalendarWeekRule, ci.DateTimeFormat.FirstDayOfWeek);
if (firstWeek <= 1)
{
weekNumber -= 1;
}
DateTime result = firstMonday.AddDays((weekNumber-1) * 7);
return string.Format("{0}-{1}", result.ToString("M/d", ci), result.AddDays(6).ToString("M/d", ci));
}
private static int GetWeekOfYear(DateTime value)
{
return ci.Calendar.GetWeekOfYear(value, ci.DateTimeFormat.CalendarWeekRule, ci.DateTimeFormat.FirstDayOfWeek);
}
CultureInfo instance
static CultureInfo ci = new CultureInfo("en-us");
Logic
// load parsed data from DataTable to a list
var data = (from row in dt.AsEnumerable()
select new
{
Date = DateTime.Parse(row.Field<string>("Date"), ci),
Employee = row.Field<string>("Employee"),
Value = row.Field<double>("Job1")
}).ToList();
// find min/max date and week number
var minDateTime = data.Select(i => i.Date).Min();
var maxDateTime = data.Select(i => i.Date).Max();
var minWeekNumber = GetWeekOfYear(minDateTime);
var maxWeekNumber = GetWeekOfYear(maxDateTime);
// prepare result DataTable
var resultDt = new DataTable("Job1");
resultDt.Columns.Add("Employee", typeof(string));
for (int i = minWeekNumber; i <= maxWeekNumber; i++)
resultDt.Columns.Add(i.ToString(), typeof(double));
// prepare grouped data query
var employeeData = from d in data
group d by d.Employee into g
select new
{
Employee = g.Key,
Items = g.GroupBy(x => GetWeekOfYear(x.Date))
.Select(x => new
{
Week = x.Key,
Value = x.Sum(xx => xx.Value)
})
};
// iterate over query results and fill resultsDt
foreach (var e in employeeData)
{
var newRow = resultDt.NewRow();
newRow["Employee"] = e.Employee;
foreach (var d in e.Items)
newRow[d.Week.ToString()] = d.Value;
resultDt.Rows.Add(newRow);
}
// change column names from week numbers to proper start-end dates
foreach(DataColumn col in resultDt.Columns)
{
int weekNumber;
if (int.TryParse(col.ColumnName, out weekNumber))
col.ColumnName = GetColumnName(weekNumber);
}
Results:
Job1
Employee 1/7-1/13 1/14-1/20 1/21-1/27 1/28-2/3 2/4-2/10 2/11-2/17 2/18-2/24 2/25-3/3 3/4-3/10 3/11-3/
A 1,3 1,87 9,43 44 2
B 2,5 6,85 5,27 45
C 61,7 45,4 45
D 11 7
F 334 265
If anybody have the same requirement that is to show the results week wise then here is the code:
private static DataTable GetWeeklyColumnsAndData(DataTable datatable, string resultFor)
{
DateTime minDate = datatable.AsEnumerable()
.Min(r => DateTime.Parse(r.Field<string>("DATE")));
DateTime maxDate = datatable.AsEnumerable()
.Max(r => DateTime.Parse(r.Field<string>("DATE")));
var distinctValues = datatable.AsEnumerable()
.Select(row => new
{
Employee = row.Field<string>("Employee")
})
.Distinct()
.ToList();
int totalEmployeeCount = System.Linq.Enumerable.Count(distinctValues);
DataTable resultDt = new DataTable();
resultDt.Columns.Add("Employee", typeof(string));
DateTime firstMonday = (minDate.DayOfWeek == DayOfWeek.Monday) ? minDate : GetNextWeekday(minDate, DayOfWeek.Monday);
DateTime startingMonday = firstMonday;
// add columns first
while (firstMonday < maxDate)
{
string weekName = string.Format("{0}-{1}", firstMonday.ToString("M/d", ci), firstMonday.AddDays(6).ToString("M/d", ci));
resultDt.Columns.Add(weekName, typeof(string));
firstMonday = firstMonday.AddDays(7);
}
for (int row = 0; row < totalEmployeeCount; row++)
{
DateTime startDate = startingMonday;
DateTime endDate = startingMonday.AddDays(6);
DataRow newRow = resultDt.NewRow();
string employee = distinctValues[row].Employee.ToString();
// first column for entity
newRow[0] = employee;
for (int col = 1; col < resultDt.Columns.Count; col++)
{
bool isBlank = false;
double total = 0;
string formattedMonday = endDate.ToString("M/d/yyyy");
string expression = String.Format("Employee = '{0}' AND DATE >= #{1}# AND DATE <= #{2}#", employee, startDate.ToString("M/d/yyyy"), formattedMonday);
DataView dv = datatable.DefaultView;
dv.RowFilter = expression;
if (dv.Count > 0)
{
foreach (DataRowView rowView in dv)
{
DataRow r = rowView.Row;
string value = r[resultFor].ToString();
if (value != "")
{
total += Convert.ToDouble(value);
}
else
{
isBlank = true;
}
}
}
else
{
isBlank = true;
}
if (total == 0 && isBlank)
{
newRow[col] = "";
}
else
{
newRow[col] = total;
}
startDate = endDate.AddDays(1);
endDate = startDate.AddDays(6);
}
resultDt.Rows.Add(newRow);
}
return resultDt;
}
public static DateTime GetNextWeekday(DateTime start, DayOfWeek day)
{
// The (... + 7) % 7 ensures we end up with a value in the range [0, 6]
int daysToAdd = ((int)day - (int)start.DayOfWeek + 7) % 7;
return start.AddDays(daysToAdd);
}

How can I create a data set with filler rows in linq?

If I have a list like this:
var teams = new List() { "Team A", "Team B", "Team C" };
And I have a data set with scores like this:
var scores = new List<scoredata> {
new scoredata() { Team = 'Team A', Date = '1/1/2012', Value = 1 },
new scoredata() { Team = 'Team B', Date = '1/1/2012', Value = 1 },
new scoredata() { Team = 'Team C', Date = '1/1/2012', Value = 1 },
new scoredata() { Team = 'Team A', Date = '1/2/2012', Value = 2 },
new scoredata() { Team = 'Team B', Date = '1/3/2012', Value = 3 },
new scoredata() { Team = 'Team C', Date = '1/4/2012', Value = 4 }
}
Is it possible to construct a data set that looks like this?
Team A, '1/1/2012', 1
Team B, '1/1/2012', 1
Team C, '1/1/2012', 1
Team A, '1/2/2012', 2
Team B, '1/2/2012', null
Team C, '1/2/2012', null
Team A, '1/3/2012', null
Team B, '1/3/2012', 3
Team C, '1/3/2012', null
Team A, '1/4/2012', null
Team B, '1/4/2012', null
Team C, '1/4/2012', 4
I'm not sure what this is called, but I want to fill out blank dates and scores in my final dataset so that it always returns all Teams for each date, but if score data is not available returns null.
var dates = scores.Select(s => s.Date).Distinct();
var result =
from date in dates
from team in teams
let teamScores = scores.Where(s => s.Team == team && s.Date == date)
orderby date
select new { team, date, Score = teamScores.FirstOrDefault() };
Didn't check with compiler though, give it a try.
Using pure LINQ to Objects.
public class ScoreData
{
public string Team { get; set; }
public string Date { get; set; }
public int? Value { get; set; }
}
var teams = new[] { "Team A", "Team B", "Team C" };
var scores = new[]
{
new ScoreData { Team = "Team A", Date = "1/1/2012", Value = 1 },
new ScoreData { Team = "Team B", Date = "1/1/2012", Value = 1 },
new ScoreData { Team = "Team C", Date = "1/1/2012", Value = 1 },
new ScoreData { Team = "Team A", Date = "1/2/2012", Value = 2 },
new ScoreData { Team = "Team B", Date = "1/3/2012", Value = 3 },
new ScoreData { Team = "Team C", Date = "1/4/2012", Value = 4 },
};
var dates = scores.Select(score => score.Date).Distinct();
var query =
from date in dates
from team in teams
join score in scores
on new { Team = team, Date = date }
equals new { score.Team, score.Date }
into filteredScores
let defaultScore = new ScoreData
{
Team = team,
Date = date,
Value = null,
}
from score in filteredScores.DefaultIfEmpty(defaultScore)
select score;
Note, this most likely won't work as-is in LINQ to SQL or LINQ to Entities, it will need some tweaks.

How to get the Max() of a Count() with LINQ

I'm new to LINQ and I have this situation. I have this table:
ID Date Range
1 10/10/10 9-10
2 10/10/10 9-10
3 10/10/10 9-10
4 10/10/10 8-9
5 10/11/10 1-2
6 10/11/10 1-2
7 10/12/10 5-6
I just want to list the Maximun of rows per date by range, like this:
Date Range Total
10/10/10 9-10 3
10/11/10 1-2 2
10/12/10 5-6 1
I want to do this by using LINQ, do you have any ideas of how to do this?
I think something along these lines should work:
List<MyTable> items = GetItems();
var orderedByMax = from i in items
group i by i.Date into g
let q = g.GroupBy(i => i.Range)
.Select(g2 => new {Range = g2.Key, Count = g2.Count()})
.OrderByDescending(i => i.Count)
let max = q.FirstOrDefault()
select new {
Date = g.Key,
Range = max.Range,
Total = max.Count
};
Using extension methods:
List<MyTable> items = GetItems();
var rangeTotals = items.GroupBy(x => new { x.Date, x.Range }) // Group by Date + Range
.Select(g => new {
Date = g.Key.Date,
Range = g.Key.Range,
Total = g.Count() // Count total of identical ranges per date
});
var rangeMaxTotals = rangeTotals.Where(rt => !rangeTotals.Any(z => z.Date == rt.Date && z.Total > rt.Total)); // Get maximum totals for each date
unfortunately I can't test this at the moment but give this a try:
List<MyTable> items = GetItems();
items.Max(t=>t.Range.Distinct().Count());
This approach:
1) Groups by Date
2) For each Date, groups by Range and calculates the Total
3) For each Date, selects the item with the greatest Total
4) You end up with your result
public sealed class Program
{
public static void Main(string[] args)
{
var items = new[]
{
new { ID = 1, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 2, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 3, Date = new DateTime(10, 10, 10), Range = "9-10" },
new { ID = 4, Date = new DateTime(10, 10, 10), Range = "8-9" },
new { ID = 5, Date = new DateTime(10, 10, 11), Range = "1-2" },
new { ID = 6, Date = new DateTime(10, 10, 11), Range = "1-2" },
new { ID = 7, Date = new DateTime(10, 10, 12), Range = "5-6" },
};
var itemsWithTotals = items
.GroupBy(item => item.Date) // Group by Date.
.Select(groupByDate => groupByDate
.GroupBy(item => item.Range) // Group by Range.
.Select(groupByRange => new
{
Date = groupByDate.Key,
Range = groupByRange.Key,
Total = groupByRange.Count()
}) // Got the totals for each grouping.
.MaxElement(item => item.Total)); // For each Date, grab the item (grouped by Range) with the greatest Total.
foreach (var item in itemsWithTotals)
Console.WriteLine("{0} {1} {2}", item.Date.ToShortDateString(), item.Range, item.Total);
Console.Read();
}
}
/// <summary>
/// From the book LINQ in Action, Listing 5.35.
/// </summary>
static class ExtensionMethods
{
public static TElement MaxElement<TElement, TData>(this IEnumerable<TElement> source, Func<TElement, TData> selector) where TData : IComparable<TData>
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
bool firstElement = true;
TElement result = default(TElement);
TData maxValue = default(TData);
foreach (TElement element in source)
{
var candidate = selector(element);
if (firstElement || (candidate.CompareTo(maxValue) > 0))
{
firstElement = false;
maxValue = candidate;
result = element;
}
}
return result;
}
}
According to LINQ in Action (Chapter 5.3.3 - Will LINQ to Objects hurt the performance of my code?), using the MaxElement extension method is one of the most effecient approaches. I think the performance would be O(4n); one for the first GroupBy, two for the second GroupBy, three for the Count(), and four for loop within MaxElement.
DrDro's approach is going to be more like O(n^2) since it loops the entire list for each item in the list.
StriplingWarrior's approach is going to be closer to O(n log n) because it sorts the items. Though I'll admit, there may be some crazy magic in there that I don't understand.

Simple LINQ query

I have a List of X items. I want to have LINQ query that will convert it into batches (a List of Lists), where each batch has 4 items, except for the last one which can have 1-4 (whatever the remainder is). Also, the number 4 should be configurable so it could 5, 17, etc.
Can anyone tell me how to write that?
List<Item> myItems = ...;
List<List<Item>> myBatches = myItems.????
Thank you in advance!
If you're happy with the results being typed as IEnumerable<IEnumerable<T>> then you can do this:
int groupSize = 4;
var myBatches = myItems.Select((x, i) => new { Val = x, Idx = i })
.GroupBy(x => x.Idx / groupSize,
x => x.Val);
If you want an actual List<List<T>> then you'll need to add a couple of extra ToList calls:
int groupSize = 4;
var myBatches = myItems.Select((x, i) => new { Val = x, Idx = i })
.GroupBy(x => x.Idx / groupSize,
x => x.Val,
(k, g) => g.ToList())
.ToList();
Here is a good article about using Take and Skip to do paging, which is identical functionality to what you are requesting. It doesn't get you all of the way to a single line of LINQ, but hopefully helps.
This made me think of how we did this before LINQ.
var vessels = new List<Vessel>()
{ new Vessel() { id = 8, name = "Millennium Falcon" },
new Vessel() { id = 4, name = "Ebon Hawk" },
new Vessel() { id = 34, name = "Virago"},
new Vessel() { id = 12, name = "Naboo royal starship"},
new Vessel() { id = 17, name = "Radiant VII"},
new Vessel() { id = 7, name = "Lambda-class shuttle"},
new Vessel() { id = 23, name = "Rogue Shadow"}};
var chunksize=2;
// With LINQ
var vesselGroups = vessels.Select((v, i) => new { Vessel = v, Index = i })
.GroupBy(c => c.Index / chunksize, c => c.Vessel, (t,e)=>e.ToList())
.ToList();
// Before LINQ (most probably not optimal)
var groupedVessels = new List<List<Vessel>>();
var g = new List<Vessel>();
var chunk = chunksize;
foreach(var vessel in vessels)
{
g.Add(vessel);
chunk--;
if (chunk == 0)
{
groupedVessels.Add(g);
g = new List<Vessel>();
chunk = chunksize;
}
}
groupedVessels.Add(g);

Resources