linq query with grouping and conditional result - linq

I have an addition to a similar question I have asked before. I have a list of car type structured like below :
class Car
{
public string Make { get; set; }
public string Model { get; set; }
public string SecondHand { get; set; }
public int AccidentCount { get; set; }
public int MaintenanceCount { get; set; }
}
List<Car> cars = new List<Car>()
{
new Car(){Make = "Mercedes", Model = "E-200", SecondHand ="N", AccidentCount = 1 ,MaintenanceCount = 0},
new Car(){Make = "Mercedes", Model = "E-200", SecondHand ="N", AccidentCount = 1 ,MaintenanceCount = 1},
new Car(){Make = "Mercedes", Model = "E-200", SecondHand ="Y", AccidentCount = 1 ,MaintenanceCount = 1},
new Car(){Make = "Mercedes", Model = "E-180", SecondHand ="N", AccidentCount = 0 ,MaintenanceCount = 1},
new Car(){Make = "Mercedes", Model = "E-180", SecondHand ="N", AccidentCount = 1 ,MaintenanceCount = 1}
};
What i need in output of query is 2 columns for Make and Model by grouping them, get sum of AccidentCount and MaintenanceCount in 2 columns and finally if there is any "SecondHand" value"Y" for a given Model output "Y" otherwise "N".
Output for above should be :
Make Model AccidentCount MaintenanceCount SecondHand
Mercedes E-200 3 2 Y
Mercedes E-180 1 2 N

What i need in output of query is 2 columns for Make and Model by grouping them,
cars.GroupBy(c=> new {c.Make,c.Model})
get sum of AccidentCount and MaintenanceCount in 2 columns
AccidentCount = g.Sum(c=>c.AccidentCount),
MaintenanceCount = g.Sum(c=>c.MaintenanceCount)
and finally if there is any "SecondHand" value"Y" for a given Model output "Y" otherwise "N".
SecondHand = g.Any(c=> c.SecondHand=="Y") ? "Y" : "N"
The final query looks like this:
cars.GroupBy(c=> new {c.Make,c.Model})
.Select(g=> new Car {
Make = g.Key.Make,
Model = g.Key.Model,
AccidentCount = g.Sum(c=>c.AccidentCount),
MaintenanceCount = g.Sum(c=>c.MaintenanceCount),
SecondHand = g.Any(c=> c.SecondHand=="Y") ? "Y" : "N"
});

Related

Comparing two lists with multiple conditions

I have two different lists of same type. I wanted to compare both lists and need to get the values which are not matched.
List of class:
public class pre
{
public int id {get; set;}
public datetime date {get; set;}
public int sID {get; set;}
}
Two lists :
List<pre> pre1 = new List<pre>();
List<pre> pre2 = new List<pre>();
Query which I wrote to get the unmatched values:
var preResult = pre1.where(p1 => !pre
.any(p2 => p2.id == p1.id && p2.date == p1.date && p2.sID == p1sID));
But the result is wrong here. I am getting all the values in pre1.
Here is solution :
class Program
{
static void Main(string[] args)
{
var pre1 = new List<pre>()
{
new pre {id = 1, date =DateTime.Now.Date, sID=1 },
new pre {id = 7, date = DateTime.Now.Date, sID = 2 },
new pre {id = 9, date = DateTime.Now.Date, sID = 3 },
new pre {id = 13, date = DateTime.Now.Date, sID = 4 },
// ... etc ...
};
var pre2 = new List<pre>()
{
new pre {id = 1, date =DateTime.Now.Date, sID=1 },
// ... etc ...
};
var preResult = pre1.Where(p1 => !pre2.Any(p2 => p2.id == p1.id && p2.date == p1.date && p2.sID == p1.sID)).ToList();
Console.ReadKey();
}
}
Note:Property date contain the date and the time part will be 00:00:00.
I fixed some typos and tested your code with sensible values, and your code would correctly select unmatched records. As prabhakaran S's answer mentions, perhaps your date values include time components that differ. You will need to check your data and decide how to proceed.
However, a better way to select unmatched records from one list compared against another would be to utilize a left join technique common to working with relational databases, which you can also do in Linq against in-memory collections. It will scale better as the sizes of your inputs grow.
var preResult = from p1 in pre1
join p2 in pre2
on new { p1.id, p1.date, p1.sID }
equals new { p2.id, p2.date, p2.sID } into grp
from item in grp.DefaultIfEmpty()
where item == null
select p1;

Linq join two lists: is it more efficient to use Dictionary?

Final rephrase
Below I join two sequences and I wondered if it would be faster to create a Dictionary of one sequence with the keySelector of the join as key and iterate through the other collection and find the key in the dictionary.
This only works if the key selector is unique. A real join has no problem with two records having the same key. In a dictionary you'll have to have unique keys
I measured the difference, and I noticed that the dictionary method is about 13% faster. In most use cases ignorable. See my answer to this question
Rephrased question
Some suggested that this question is the same question as LINQ - Using where or join - Performance difference?, but this one is not about using where or join, but about using a Dictionary to perform the join.
My question is: if I want to join two sequences based on a key selector, which method would be faster?
Put all items of one sequence in a Dictionary and enumerate the other sequence to see if the item is in the Dictionary. This would mean to iterate through both sequences once and calculate hash codes on the keySelector for every item in both sequences once.
The other method: use System.Enumerable.Join.
The question is: Would Enumerable.Join for each element in the first list iterate through the elements in the second list to find a match according to the key selector, having to compare N * N elements (is this called second order?) or would it use a more advanced method?
Original question with examples
I have two classes, both with a property Reference. I have two sequences of these classes and I want to join them based on equal Reference.
Class ClassA
{
public string Reference {get;}
...
}
public ClassB
{
public string Reference {get;}
...
}
var listA = new List<ClassA>()
{
new ClassA() {Reference = 1, ...},
new ClassA() {Reference = 2, ...},
new ClassA() {Reference = 3, ...},
new ClassA() {Reference = 4, ...},
}
var listB = new List<ClassB>()
{
new ClassB() {Reference = 1, ...},
new ClassB() {Reference = 3, ...},
new ClassB() {Reference = 5, ...},
new ClassB() {Reference = 7, ...},
}
After the join I want combinations of ClassA objects and ClassB objects that have an equal Reference. This is quite simple to do:
var myJoin = listA.Join(listB, // join listA and listB
a => a.Reference, // from listA take Reference
b => b.Reference, // from listB take Reference
(objectA, objectB) => // if references equal
new {A = objectA, B = objectB}); // return combination
I'm not sure how this works, but I can imagine that for each a in listA the listB is iterated to see if there is a b in listB with the same reference as A.
Question: if I know that the references are Distinct wouldn't it be more efficient to convert B into a Dictionary and compare the Reference for each element in listA:
var dictB = listB.ToDictionary<string, ClassB>()
var myJoin = listA
.Where(a => dictB.ContainsKey(a.Reference))
.Select(a => new (A = a, B = dictB[a.Reference]);
This way, every element of listB has to be accessed once to put in the dictionary and every element of listA has to be accessed once, and the hascode of Reference has to be calculated once.
Would this method be faster for large collections?
I created a test program for this and measured the time it took.
Suppose I have a class of Person, each person has a name and a Father property which is of type Person. If the Father is not know, the Father property is null
I have a sequence of Bastards (no father) that have exactly one Son and One Daughter. All Daughters are put in one sequence. All sons are put in another sequences.
The query: join the sons and the daughters that have the same father.
Results: Joining 1 million families using Enumerable.Join took 1.169 sec. Joining them using Dictionary join used 1.024 sec. Ever so slightly faster.
The code:
class Person : IEquatable<Person>
{
public string Name { get; set; }
public Person Father { get; set; }
// + a lot of equality functions get hash code etc
// for those interested: see the bottom
}
const int nrOfBastards = 1000000; // one million
var bastards = Enumerable.Range (0, nrOfBastards)
.Select(i => new Person()
{ Name = 'B' + i.ToString(), Father = null })
.ToList();
var sons = bastards.Select(father => new Person()
{Name = "Son of " + father.Name, Father = father})
.ToList();
var daughters = bastards.Select(father => new Person()
{Name = "Daughter of " + father.Name, Father = father})
.ToList();
// join on same parent: Traditionally and using Dictionary
var stopwatch = Stopwatch.StartNew();
this.TraditionalJoin(sons, daughters);
var time = stopwatch.Elapsed;
Console.WriteLine("Traditional join of {0} sons and daughters took {1:F3} sec", nrOfBastards, time.TotalSeconds);
stopwatch.Restart();
this.DictionaryJoin(sons, daughters);
time = stopwatch.Elapsed;
Console.WriteLine("Dictionary join of {0} sons and daughters took {1:F3} sec", nrOfBastards, time.TotalSeconds);
}
private void TraditionalJoin(IEnumerable<Person> boys, IEnumerable<Person> girls)
{ // join on same parent
var family = boys
.Join(girls,
boy => boy.Father,
girl => girl.Father,
(boy, girl) => new { Son = boy.Name, Daughter = girl.Name })
.ToList();
}
private void DictionaryJoin(IEnumerable<Person> sons, IEnumerable<Person> daughters)
{
var sonsDictionary = sons.ToDictionary(son => son.Father);
var family = daughters
.Where(daughter => sonsDictionary.ContainsKey(daughter.Father))
.Select(daughter => new { Son = sonsDictionary[daughter.Father], Daughter = daughter })
.ToList();
}
For those interested in the equality of Persons, needed for a proper dictionary:
class Person : IEquatable<Person>
{
public string Name { get; set; }
public Person Father { get; set; }
public bool Equals(Person other)
{
if (other == null)
return false;
else if (Object.ReferenceEquals(this, other))
return true;
else if (this.GetType() != other.GetType())
return false;
else
return String.Equals(this.Name, other.Name, StringComparison.OrdinalIgnoreCase);
}
public override bool Equals(object obj)
{
return this.Equals(obj as Person);
}
public override int GetHashCode()
{
const int prime1 = 899811277;
const int prime2 = 472883293;
int hash = prime1;
unchecked
{
hash = hash * prime2 + this.Name.GetHashCode();
if (this.Father != null)
{
hash = hash * prime2 + this.Father.GetHashCode();
}
}
return hash;
}
public override string ToString()
{
return this.Name;
}
public static bool operator==(Person x, Person y)
{
if (Object.ReferenceEquals(x, null))
return Object.ReferenceEquals(y, null);
else
return x.Equals(y);
}
public static bool operator!=(Person x, Person y)
{
return !(x==y);
}
}

Linq left join with multiple columns and default values [duplicate]

This question already has answers here:
Closed 10 years ago.
I am fairly new to linq and I need to join two tables with the following requirements:
Should left join t1 and t2.
If t2 is empty then the query should not fail - should use default values.
My query:
var final = from t1 in saDist.AsEnumerable()
from t2 in sapGrouped.AsEnumerable()
where
t1.Supplier.Id == t2.Supplier.Id && t1.VatRate == t2.VatRate
select
new
{
t1.Supplier,
Amount = t1.Amount - t2.Amount,
Advance = t1.Advance - t2.Advance,
Balance = t1.Balance - t2.Balance,
t1.VatRate
};
Can someone correct this?
This works in Linqpad as a C# program.
Basically your join syntax needed tweaking (see this), and you needed to take into account when there was nothing to join to for "t2" (so we do a null check and use 0 when null, otherwise t2.Amount, etc)
I created some dummy data so you can play around.
See http://codingsense.wordpress.com/2009/03/08/left-join-right-join-using-linq/ for another example.
I hope it does what you want it to do.
Thanks,
Dominique
public class A
{
void Main()
{
Distributor dist1 = new Distributor() { SupplierID = 1, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "A", DeptSupplierID = 1 };
Distributor dist2 = new Distributor() { SupplierID = 2, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "B", DeptSupplierID = 1 };
Distributor dist3 = new Distributor() { SupplierID = 3, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "C", DeptSupplierID = 1 };
Distributor dist4 = new Distributor() { SupplierID = 4, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "D", DeptSupplierID = 2 };
Distributor dist5 = new Distributor() { SupplierID = 5, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "E", DeptSupplierID = 2 };
Distributor dist6 = new Distributor() { SupplierID = 6, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "F", DeptSupplierID = 2 };
Distributor dist7 = new Distributor() { SupplierID = 7, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "G", DeptSupplierID = 6 };
Distributor dist8 = new Distributor() { SupplierID = 8, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "H", DeptSupplierID = 3 };
Distributor dist9 = new Distributor() { SupplierID = 9, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "I", DeptSupplierID = 3 };
Distributor dist10 = new Distributor() { SupplierID = 10, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "J", DeptSupplierID = 7 };
Distributor dist11 = new Distributor() { SupplierID = 11, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "K", DeptSupplierID = 7 };
Distributor dist12 = new Distributor() { SupplierID = 12, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "L", DeptSupplierID = 5 };
SAPGroup Dept1 = new SAPGroup() { SupplierID = 1, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "Development" };
SAPGroup Dept2 = new SAPGroup() { SupplierID = 2, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "Testing" };
SAPGroup Dept3 = new SAPGroup() { SupplierID = 3, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "Marketing" };
SAPGroup Dept4 = new SAPGroup() { SupplierID = 4, Amount = 3, Balance = 4, Advance = 3, VatRateID = 1, Name = "Support" };
List ListOfDistributors = new List();
ListOfDistributors.AddRange((new Distributor[] { dist1, dist2, dist3, dist4, dist5, dist6, dist7,
dist8, dist9, dist10, dist11, dist12 }));
List ListOfSAPGroup = new List();
ListOfSAPGroup.AddRange(new SAPGroup[] { Dept1, Dept2, Dept3, Dept4 });
var final = from t1 in ListOfDistributors
join t2 in ListOfSAPGroup
on new { t1.SupplierID, t1.VatRateID } equals new { t2.SupplierID, t2.VatRateID }
into JoinedDistAndGrouped
from t2 in JoinedDistAndGrouped.DefaultIfEmpty()
select new
{
Name1 = t1.Name,
Name2 = (t2 == null) ? "no name" : t2.Name,
SupplierID = t1.SupplierID,
Amount = t1.Amount - (t2 == null ? 0 : t2.Amount),
Advance = t1.Advance - (t2 == null ? 0 : t2.Advance),
Balance = t1.Advance - (t2 == null ? 0 : t2.Balance),
VatRateID = t1.VatRateID
};
final.Dump();
}
}
class Distributor
{
public string Name { get; set; }
public int SupplierID { get; set; }
public int VatRateID { get; set; }
public int DeptSupplierID { get; set; }
public int Amount { get; set; }
public int Advance { get; set; }
public int Balance { get; set; }
}
class SAPGroup
{
public int SupplierID { get; set; }
public int VatRateID { get; set; }
public string Name { get; set; }
public int Amount { get; set; }
public int Advance { get; set; }
public int Balance { get; set; }
}
public class Result
{
public string Name1 { get; set; }
public string Name2 { get; set; }
public int SupplierID { get; set; }
public int Amount { get; set; }
public int Advance { get; set; }
public int Balance { get; set; }
public int VatRateID { get; set; }
}
Thanks for your input. None of the answers did quite what I wanted, but I managed to get my original code working:
var final = from t2 in saDist.AsEnumerable()
from t1 in sapGrouped.AsEnumerable().DefaultIfEmpty()
where
t1 == null || (t2.Supplier.Id == t1.Supplier.Id && t2.VatRate == t1.VatRate)
select
new
{
t2.Supplier,
Amount = t2.Amount - (t1 == null ? 0 : t1.Amount),
Advance = t2.Advance - (t1 == null ? 0 : t1.Advance),
Balance = t2.Balance - (t1 == null ? 0 : t1.Balance),
t2.VatRate
};
If you have any comments or improvements on this let me know, thanks.
According to this, you are looking for something like (this is untested, but hopefully leads you on the right track):
var final = from t1 in saDist.AsEnumerable()
join t2 in sapGrouped.AsEnumerable()
on t1.Supplier.Id equals t2.Supplier.Id
and t1.VatRate equals t2.VatRate into t1_t2 //not sure about this line
from t2 in t1_t2.DefaultIfEmpty()
{
t1.Supplier,
Amount = t1.Amount - t2.Amount,
Advance = t1.Advance - t2.Advance,
Balance = t1.Balance - t2.Balance,
t1.VatRate
};
Notice the .DefaultIfEmpty(), this satisfies: "If t2 is empty then the query should not fail - should use default values."

Linq Convert to Custom Dictionary?

.NET 4, I have
public class Humi
{
public int huKey { get; set; }
public string huVal { get; set; }
}
And in another class is this code in a method:
IEnumerable<Humi> someHumi = new List<Humi>(); //This is actually ISingleResult that comes from a LinqToSql-fronted sproc but I don't think is relevant for my question
var humia = new Humi { huKey = 1 , huVal = "a"};
var humib = new Humi { huKey = 1 , huVal = "b" };
var humic = new Humi { huKey = 2 , huVal = "c" };
var humid = new Humi { huKey = 2 , huVal = "d" };
I want to create a single IDictionary <int,string[]>
with key 1 containing ["a","b"] and key 2 containing ["c","d"]
Can anyone point out a decent way to to that conversion with Linq?
Thanks.
var myDict = someHumi
.GroupBy(h => h.huKey)
.ToDictionary(
g => g.Key,
g => g.ToArray())
Create an IEnumerable<IGrouping<int, Humi>> and then project that into a dictionary. Note .ToDictionary returns a Dictionary, not an IDictionary.
You can use ToLookup() which allows each key to hold multiple values, exactly your scenario (note that each key would hold an IEnumerable<string> of values though not an array):
var myLookup = someHumi.ToLookup(x => x.huKey, x => x.huVal);
foreach (var item in myLookup)
{
Console.WriteLine("{0} contains: {1}", item.Key, string.Join(",", item));
}
Output:
1 contains: a,b
2 contains: c,d

Linq to CSV select by column value

I know I have asked this question in a different manner earlier today but I have refined my needs a little better.
Given the following csv file where the first column is the title and there could be any number of columns;
year,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
income,1000,1500,2000,2100,2100,2100,2100,2100,2100,2100
dividends,100,200,300,300,300,300,300,300,300,300
net profit,1100,1700,2300,2400,2400,2400,2400,2400,2400,2400
expenses,500,600,500,400,400,400,400,400,400,400
profit,600,1100,1800,2000,2000,2000,2000,2000,2000,2000
How do I select the profit value for a given year? So I may provide a year of say 2011 and expect to get the profit value of 2000 back.
At the moment I have this which shows the profit value for each year but ideally I'd like to specify the year and get the profit value;
var data = File.ReadAllLines(fileName)
.Select(
l => {
var split = l.Split(",".ToCharArray());
return split;
}
);
var profit = (from p in data where p[0] == profitFieldName select p).SingleOrDefault();
var years = (from p in data where p[0] == yearFieldName select p).FirstOrDefault();
int columnCount = years.Count() ;
for (int t = 1; t < columnCount; t++)
Console.WriteLine("{0} : ${1}", years[t], profit[t]);
I've already answered this once today, but this answer is a little more fleshed out and hopefully clearer.
string rowName = "profit";
string year = "2011";
var yearRow = data.First();
var yearIndex = Array.IndexOf(yearRow, year);
// get your 'profits' row, or whatever row you want
var row = data.Single(d => d[0] == rowName);
// return the appropriate index for that row.
return row[yearIndex];
This works for me.
You have an unfortunate data format, but I think the best thing to do is just to define a class, create a list, and then use your inputs to create objects to add to the list. Then you can do whatever querying you need to get your desired results.
class MyData
{
public string Year { get; set; }
public decimal Income { get; set; }
public decimal Dividends { get; set; }
public decimal NetProfit { get; set; }
public decimal Expenses { get; set; }
public decimal Profit { get; set; }
}
// ...
string dataFile = #"C:\Temp\data.txt";
List<MyData> list = new List<MyData>();
using (StreamReader reader = new StreamReader(dataFile))
{
string[] years = reader.ReadLine().Split(',');
string[] incomes = reader.ReadLine().Split(',');
string[] dividends = reader.ReadLine().Split(',');
string[] netProfits = reader.ReadLine().Split(',');
string[] expenses = reader.ReadLine().Split(',');
string[] profits = reader.ReadLine().Split(',');
for (int i = 1; i < years.Length; i++) // index 0 is a title
{
MyData myData = new MyData();
myData.Year = years[i];
myData.Income = decimal.Parse(incomes[i]);
myData.Dividends = decimal.Parse(dividends[i]);
myData.NetProfit = decimal.Parse(netProfits[i]);
myData.Expenses = decimal.Parse(expenses[i]);
myData.Profit = decimal.Parse(profits[i]);
list.Add(myData);
}
}
// query for whatever data you need
decimal maxProfit = list.Max(data => data.Profit);

Resources