Filtering out values from a list of object in a List - performance

I have an IEnumerable collection of UnitGroup: IEnumerable<UnitGroup>,
class UnitGroup
{
string key { get; set; }
List<UnitType> NameList { get; set; }
}
class UnitType
{
String UnitName{ get; set; }
Description { get; set; }
}
Now I want to filterIEnumerable<UnitGroup> based on UnitType's UnitName.
For example I want to get only the records of UnitName that contains a string and remove remaining.
something like this:
IEnumerable<UnitGroup> Groups;
IEnumerable<UnitGroup> filteredResult = Groups.NameList(o => o.UnitName.contains("test"));
And get IEnumerable<UnitGroup> with only filtered UnitNames under UnitType under UnitGroup.
What is the best way of acheiving this?

I'm not 100% sure what you're trying to achieve. Could you provide some sample data, to make it more clear?
Although, I think it may fit into your goal:
IEnumerable<UnitGroup> Groups;
var filteredResult = Groups.Select(g => new UnitGroup {
key = g.key,
NameList = g.NameList
.Where(n => n.UnitName == "test")
.ToList()
})
.Where(g => g.NameList.Count > 0);

Here is another way that should do what #MarcinJuraszek answers does. (I am guessing the intent of the question as well.)
IEnumerable<UnitGroup> Groups;
var filteredResult = Groups.Where (g => g.NameList.Count() > g.NameList.RemoveAll(nl => nl.UnitName != "Name1"));
If the number of removed items was less than the original count, then we have items that are of interest, so select the parent.
Note: This will modify the original collection, so if you need to filter it more than once then this is not the answer you are looking for.

Try this:
var filteredList = from g in Groups
where g.NameList.Exists(i=>i.UnitName=="test")
select g;

Related

LINQ DistinctBy chosing what object to keep

If I have a list of objects and I don't want to allow duplicates of a certain attribute of the objects. My understanding is that I can use DistinctBy() to remove one of the objects. My question is, how do I choose which of the objects with the same value of an attribute value do I keep?
Example:
How would I go about removing any objects with a duplicate value of "year" in the list tm and keep the object with the highest value of someValue?
class TestModel{
public int year{ get; set; }
public int someValue { get; set; }
}
List<TestModel> tm = new List<TestModel>();
//populate list
//I was thinking something like this
tm.DistinctBy(x => x.year).Select(x => max(X=>someValue))
You can use GroupBy and Aggregate (there is no MaxBy built-in method in LINQ):
tm
.GroupBy(tm => tm.year)
.Select(g => g.Aggregate((acc, next) => acc.someValue > next.someValue ? acc : next))
User the GroupBy followed by the SelectMany/Take(1) pattern with an OrderBy:
IEnumerable<TestModel> result =
tm
.GroupBy(x => x.year)
.SelectMany(xs =>
xs
.OrderByDescending(x => x.someValue)
.Take(1));
Here's an example:
List<TestModel> tm = new List<TestModel>()
{
new TestModel() { year = 2020, someValue = 5 },
new TestModel() { year = 2020, someValue = 15 },
new TestModel() { year = 2019, someValue = 6 },
};
That gives me:

LINQ GroupBy while converting from string to decimal and then back to string

Is it possible to convert a string value to a decimal value within a LINQ expression that performs an aggregate function like SUM or AVERAGE?
Assume the example below where I have a collection of Bank Accounts where my goal is to obtain an average of each customers bank account if they have a balance. The data comes from an XML API where all the data is read in a strings.
public class BankAccount
{
string Id{ get; set; }
string CustomerId { get; set; }
string Balance { get; set; }
}
Sample data ...
{ Id = "1", CustomerId = "Bob", Balance = "1" }
{ Id = "2", CustomerId = "Bob", Balance = "2" }
{ Id = "3", CustomerId = "Sam", Balance = "4" }
{ Id = "4", CustomerId = "Sam", Balance = "" }
{ Id = "5", CustomerId = "Alice", Balance = "" }
LINQ grouping expression. Is there a way to convert the value of Balance to a decimal so an average can be taken within the LINQ statement? I tried x => Decimal.Parse(x.Balance) but got an Input string was not in a correct format error. I only need to convert the Balance property to decimal for the Average calculation as the results would be rendered as a string in the XML.
At the same time, if an account does not have a balance listed (i.e. it's blank like Sams's first account and Alice's only account above) then I don't want the Average to take that entry included in the average, though I still want the account grouped in for display.
var groupedResults = allAccounts
.GroupBy(x => new {x.CustomerId, x.Balance})
.Select(g => new BankAccount {
CustomerId = g.Customer.Key.CustomerId,
Balance = g.Average(x => x.Balance)
}).ToList();
These are the results I am looking for:
{ CustomerId = "Bob", Balance = "1.5" }
{ CustomerId = "Sam", Balance = "4" }
{ CustomerId = "Alice", Balance = "" }
I think to achieve the result you are looking for you should try this:
var groupedResults = allAccounts
.GroupBy(x =>x.CustomerId)
.Select(g => new BankAccount {
CustomerId = g.Key,
Balance = g.Where(x =>!string.IsNullOrEmpty(x.Balance))
.Select(x =>(decimal?)decimal.Parse(x.Balance))
.DefaultIfEmpty(null)
.Average().ToString()
}).ToList();
First just group by CustomerId, is not necessary to include the Balance there. Then, to get the average and avoid the error parsing include the condition to make sure the Balance is not empty.
Another way to do it using query syntax:
from e in allAccounts
group e by e.CustomerId into g
let temp=g.Where(x =>!string.IsNullOrEmpty(x.Balance))
select new BankAccount(){CustomerId = g.Key,
Balance =temp.Any()?
temp.Average(x =>Decimal.Parse(x.Balance)).ToString():""
};
decimal d;
var groupedResults = allAccounts.GroupBy(a => a.CustomerId)
.Select(g => new BankAccount { CustomerId = g.Key, Balance = g.Average(b =>
decimal.TryParse(b.Balance, out d) ? (decimal?)d : null).ToString() }).ToList();
The .TryParse part results in (decimal?)null for strings that can't be parsed, which are then ignored by .Average. Also, the last average for Alice results in (decimal?)null and then in "".

enumerable group field using Linq?

I've written a Linq sentence like this:
var fs = list
.GroupBy(i =>
new {
X = i.X,
Ps = i.Properties.Where(p => p.Key.Equals("m")) <<<<<<<<<<<
}
)
.Select(g => g.Key });
Am I able to group by IEnumerable.Where(...) fields?
The grouping won't work here.
When grouping, the runtime will try to compare group keys in order to produce proper groups. However, since in the group key you use a property (Ps) which is a distinct IEnumerable<T> for each item in list (the comparison is made on reference equality not on sequence equality) this will result in a different collection for each element; in other words if you'll have two items:
var a = new { X = 1, Properties = new[] { "m" } };
var b = new { X = 1, Properties = new[] { "m" } };
The GroupBy clause will give you two distinct keys as you can see from the image below.
If your intent is to just project the items into the structure of the GroupBy key then you don't need the grouping; the query below should give the same result:
var fs = list.Select(item => new
{
item.X,
Ps = item.Properties.Where(p => p.Key == "m")
});
However, if you do require the results to be distinct, you'll need to create a separate class for your result and implement a separate IEqualityComparer<T> to be used with Distinct clause:
public class Result
{
public int X { get; set; }
public IEnumerable<string> Ps { get; set; }
}
public class ResultComparer : IEqualityComparer<Result>
{
public bool Equals(Result a, Result b)
{
return a.X == b.X && a.Ps.SequenceEqual(b.Ps);
}
// Implement GetHashCode
}
Having the above you can use Distinct on the first query to get distinct results:
var fs = list.Select(item => new Result
{
X = item.X,
Ps = item.Properties.Where( p => p.Key == "m")
}).Distinct(new ResultComparer());

Find / Count Redundant Records in a List<T>

I am looking for a way to identify duplicate records...only I want / expect to see them.
So the records aren't duplicated completely but the unique fields I am unconcerned with at this point. I just want to see if they have made X# payments of the exact same amount, via the exact same card, to the exact same person. (Bogus example just to illustrate)
The collection is a List<> further whatever X# is the List<>.Count will be X#. In other words all the records in the list match (again just the fields I am concerned with) or I will reject it.
The best I can come up with is to take the first record get value of say PayAmount and LINQ the other two to see if they have the same PayAmount value. Repeat for all fields to be matched. This seems horribly inefficient but I am not smart enough to think of a better way.
So any thoughts, ideas, pointers would be greatly appreciated.
JB
Something like this should do it.
var duplicates = list.GroupBy(x => new { x.Amount, x.CardNumber, x.PersonName })
.Where(x => x.Count() > 1);
Working example:
class Program
{
static void Main(string[] args)
{
List<Entry> table = new List<Entry>();
var dup1 = new Entry
{
Name = "David",
CardNumber = 123456789,
PaymentAmount = 70.00M
};
var dup2 = new Entry
{
Name = "Daniel",
CardNumber = 987654321,
PaymentAmount = 45.00M
};
//3 duplicates
table.Add(dup1);
table.Add(dup1);
table.Add(dup1);
//2 duplicates
table.Add(dup2);
table.Add(dup2);
//Find duplicates query
var query = from p in table
group p by new { p.Name, p.CardNumber, p.PaymentAmount } into g
where g.Count() > 1
select new
{
name = g.Key.Name,
cardNumber = g.Key.CardNumber,
amount = g.Key.PaymentAmount,
count = g.Count()
};
foreach (var item in query)
{
Console.WriteLine("{0}, {1}, {2}, {3}", item.name, item.cardNumber, item.amount, item.count);
}
Console.ReadKey();
}
}
public class Entry
{
public string Name { get; set; }
public int CardNumber { get; set; }
public decimal PaymentAmount { get; set; }
}
The meat of which is this:
var query = from p in table
group p by new { p.Name, p.CardNumber, p.PaymentAmount } into g
where g.Count() > 1
select new
{
name = g.Key.Name,
cardNumber = g.Key.CardNumber,
amount = g.Key.PaymentAmount,
count = g.Count()
};
You're unique entries are based off of the 3 criteria of Name, Card Number, and Payment Amount so you group by them and then use .Count() to count how many of those unique values exist. where g.Count() > 1 filters the group to duplicates only.

read CSV file Save output to LIST using LINQ

I have a sample CSV file as follows
1,A
2,B
3,C
Code:
var query = File.ReadAllLines("test.txt")
.Select(record => record.Split(','))
.Select(tokens => new { clearNum = tokens[0], MPID = tokens[1] });
foreach (var item in query)
{
Console.WriteLine("{0}, {1}", item.clearNum, item.MPID);
}
I am able to print the items.
I need to send the output of LINQ query to LIST
public class icSASList
{
public string ClearNum { get; set; }
public string MPID { get; set; }
}
List clearList = new List;
After considering the accepted answer, I'd like to suggest a solution that requires less object initializations. If the list is large, this will make a difference.
var query = File.ReadAllLines("test.txt")
.Select(record => record.Split(','))
.Select(tokens => new icSASList(){ ClearNum = tokens[0], MPID = tokens[1] });
var clearList = query.ToList();
Oh, yeah, using record.Split(',') is naive - it's normally allowed to have commas in " (quoted) fields, which will break your program. Better use something like http://www.filehelpers.com/.
I've not tried compiling it but I think you want something like this?
var clearList = query.Select(x=>new icSASList(){ClearNum = x.clearNum, MPID = x.MPID}).ToList();

Resources