Best way to get max value in LINQ? - linq

I'm newbie to LINQ. I will like to get know what's the highest value for 'Date', which method is preferred?
var ma1x= spResult.Where(p =>p.InstrumentId== instrument).OrderByDescending(u => int.Parse(u.Date)).FirstOrDefault();
var max2= spResult.Where(p =>p.InstrumentId== instrument).Max(u => int.Parse(u.Date));
Max or OrderByDescending ?

Max is better for both the developer and the computer.
Max will be always better because Max is semantic and meaningful.
Enumerable.Max Method
Returns the maximum value in a sequence of values.
msdn
You want the max value? Use Max. You want to order? Use OrderBy. The next developer will thank you. To quote Martin Fowler:
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
If you really want to use OrderBy to do the role of Max at least, wrap the orderby and the first in a method with a meaningful name. Something like ... Max. Great, now you have a meaningful OrderBy.
Lets see how this custom Max will do.
Enumerable.Max should be O(n) in the worst case when OrderBy use a quicksort which is O(n^2). So, the custom max is worst than the standard one...
Enjoy the performance bonus and go for Enumerable.Max. It is better for both the developer and the computer.
Edit:
Check Marco's answer to see how they perform in practice. A race of horses is always a nice idea to know which one is the faster.

.Max() should be faster. First of all the semantics of the method are clearer and your colleagues will know what your call does.
I've compared both your options on the AdventureWorks2014 database, with the following calls in LinqPad:
var times = new List<long>();
for(var i = 0; i < 1000; i++) {
Stopwatch sw = Stopwatch.StartNew();
var max2= SalesOrderHeaders.Max(u => u.OrderDate);
long elapsed = sw.ElapsedMilliseconds;
times.Add(elapsed);
}
var averageElapsed = times.Sum (t => t) / times.Count();
averageElapsed.Dump(" ms");
Generated SQL:
SELECT MAX([t0].[OrderDate]) AS [value]
FROM [Sales].[SalesOrderHeader] AS [t0]
GO
Result:
5 ms
var times = new List<long>();
for(var i = 0; i < 1000; i++) {
Stopwatch sw = Stopwatch.StartNew();
var max1 = SalesOrderHeaders.OrderByDescending(u => u.OrderDate).FirstOrDefault();
long elapsed = sw.ElapsedMilliseconds;
times.Add(elapsed);
}
var averageElapsed = times.Sum (t => t) / times.Count();
averageElapsed.Dump(" ms");
Generated SQL:
SELECT TOP (1) [t0].[SalesOrderID], [t0].[RevisionNumber], [t0].[OrderDate], [t0].[DueDate], [t0].[ShipDate], [t0].[Status], [t0].[OnlineOrderFlag], [t0].[SalesOrderNumber], [t0].[PurchaseOrderNumber], [t0].[AccountNumber], [t0].[CustomerID], [t0].[SalesPersonID], [t0].[TerritoryID], [t0].[BillToAddressID], [t0].[ShipToAddressID], [t0].[ShipMethodID], [t0].[CreditCardID], [t0].[CreditCardApprovalCode], [t0].[CurrencyRateID], [t0].[SubTotal], [t0].[TaxAmt], [t0].[Freight], [t0].[TotalDue], [t0].[Comment], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate]
FROM [Sales].[SalesOrderHeader] AS [t0]
ORDER BY [t0].[OrderDate] DESC
GO
Result:
28ms
Conclusion: Max() is more concise and faster!

Purely speculative, but I'd imagine max2. It is just looping through each item and checking if the value is higher than the last.
While max1 is checking which is higher and reordering. Even if it's just moving pointers around (rather than moving values), this is still more work.

The Max method is better than FirstOrDefault both of them send a result as true, but the performance of Max is good.
This code:
var ma1x= spResult.Where(p =>p.InstrumentId== instrument).OrderByDescending(u => int.Parse(u.Date)).FirstOrDefault();
First check you condition, then sort them order by your condition, after that will be select and have more action to find your result.

Related

How can I get the "actual" count of element in a IEnumerable?

If I wrote :
for (int i = 0; i < Strutture.Count(); i++)
{
}
and Strutture is an IEnumerable with 200 elements, IIS crash. That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable.
So, how can I get the "current" number of elements? I need a list?
"That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable."
Without doing such, how is it going to know how many elements there are?
For example:
Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
Without executing the LINQ, how could you know how many elements there are?
If you want to know how many elements there are in the source (e.g. Enumerable.Range) then I suggest you use a reference to that source and query it directly. E.g.
var numbers = Enumerable.Range(0,1000);
numbers.Count();
Also keep in mind some data sources don't really have a concept of 'Count' or if they do it involves going through every single item and counting them.
Lastly, if you're using .Count() repetitively [and you don't expect the value to actually change] it can be a good idea to cache:
var count = numbers.Count();
for (int i =0; i<count; i++) // Do Something
Supplemental:
"At first Count(), LINQ queries are executes. Than, for the next, it just "check" the value :) Not "execute the LINQ query again..." :)" - Markzzz
Then why don't we do that?
var query = Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
var result = query.ToArray() //Gets and stores the result!
result.Length;
:)
"But when I do the first "count", it should store (after the LINQ queries) the new IEnumerable (the state is changed). If I do again .Count(), why LINQ need to execute again ALL queries." - Markzzz
Because you're creating a query that gets compiled down into X,Y,Z. You're running the same query twice however the result may vary.
For example, check this out:
static void Main(string[] args)
{
var dataSource = Enumerable.Range(0, 100).ToList();
var query = dataSource.Where(i => i % 2 == 0);
//Run the query once and return the count:
Console.WriteLine(query.Count()); //50
//Now lets modify the datasource - remembering this could be a table in a db etc.
dataSource.AddRange(Enumerable.Range(100, 100));
//Run the query again and return the count:
Console.WriteLine(query.Count()); //100
Console.ReadLine();
}
This is why I recommended storing the results of the query above!
Materialize the number:
int number = Strutture.Count();
for (int i = 0; i < number; i++)
{
}
or materialize the list:
var list = Strutture.ToList();
for (int i = 0; i < list.Count; i++)
{
}
or use a foreach
foreach(var item in Strutture)
{
}

Truncating a collection using Linq query

I want to extract part of a collection to another collection.
I can easily do the same using a for loop, but my linq query is not working for the same.
I am a neophyte in Linq, so please help me correcting the query (if possible with explanation / beginners tutorial link)
Legacy way of doing :
Collection<string> testColl1 = new Collection<string> {"t1", "t2", "t3", "t4"};
Collection<string> testColl2 = new Collection<string>();
for (int i = 0; i < newLength; i++)
{
testColl2.Add(testColl1[i]);
}
Where testColl1 is the source & testColl2 is the desired truncated collection of count = newLength.
I have used the following linq queries, but none of them are working ...
var result = from t in testColl1 where t.Count() <= newLength select t;
var res = testColl1.Where(t => t.Count() <= newLength);
Use Enumerable.Take:
var testColl2 = testColl1.Take(newLength).ToList();
Note that there's a semantic difference between your for loop and the version using Take. The for loop will throw with IndexOutOfRangeException exception if there are less than newLength items in testColl1, whereas the Take version will silently ignore this fact and just return as many items up to newLength items.
The correct way is by using Take:
var result = testColl1.Take(newLength);
An equivalent way using Where is:
var result = testColl1.Where((i, item) => i < newLength);
These expressions will produce an IEnumerable, so you might also want to attach a .ToList() or .ToArray() at the end.
Both ways return one less item than your original implementation does because it is more natural (e.g. if newLength == 0 no items should be returned).
You could convert to for loop to something like this:
testColl1.Take(newLength)
Use Take:
var result = testColl1.Take(newLength);
This extension method returns the first N elements from the collection where N is the parameter you pass, in this case newLength.

How to optimize this linq to objects query?

I'm matching some in memory lists entities with a .contains (subselect) query to filter out old from new users.
Checking for performance problems i saw this:
The oldList mostly has around 1000 users in them, while the new list varies from 100 to 500. Is there a way to optimize this query?
Absolutely - build a set instead of checking a list each time:
// Change string to whatever the type of UserID is.
var oldUserSet = new HashSet<string>(oldList.Select(o => o.UserID));
var newUsers = NewList.Where(n => !oldUserSet.Contains(n.UserID))
.ToList();
The containment check on a HashSet should be O(1) assuming few hash collisions, instead of the O(N) of checking each against the whole sequence (for each new user).
You could make a HashSet<T> of your user IDs in advance. This will cause the Contains to become an O(1) operation:
var oldSet = new HashSet<int>(oldList.Select(o => o.UserID));
var newUsers = NewList.Where(n => !oldSet.Contains(n.UserID)).ToList();
While those HashSet<T> answers are straightforward and simple, some may prefer a linq-centric solution.
LinqToObjects implements join and GroupJoin with a HashSet. Just use one of those - this example uses GroupJoin:
List<User> newUsers =
(
from n in NewList
join o in oldList on n.UserId equals o.UserId into oldGroup
where !oldGroup.Any()
select n
).ToList()

minimum value in dictionary using linq

I have a dictionary of type
Dictionary<DateTime,double> dictionary
How can I retrive a minimum value and key coresponding to this value from this dictionary using linq ?
var min = dictionary.OrderBy(kvp => kvp.Value).First();
var minKey = min.Key;
var minValue = min.Value;
This is not very efficient though; you might want to consider MoreLinq's MinBy extension method.
If you are performing this query very often, you might want to consider a different data-structure.
Aggregate
var minPair = dictionary.Aggregate((p1, p2) => (p1.Value < p2.Value) ? p1 : p2);
Using the mighty Aggregate method.
I know that MinBy is cleaner in this case, but with Aggregate you have more power and its built-in. ;)
Dictionary<DateTime, double> dictionary;
//...
double min = dictionary.Min(x => x.Value);
var minMatchingKVPs = dictionary.Where(x => x.Value == min);
You could combine it of course if you really felt like doing it on one line, but I think the above is easier to read.
var minMatchingKVPs = dictionary.Where(x => x.Value == dictionary.Min(y => y.Value));
You can't easily do this efficiently in normal LINQ - you can get the minimal value easily, but finding the key requires another scan through. If you can afford that, use Jess's answer.
However, you might want to have a look at MinBy in MoreLINQ which would let you write:
var pair = dictionary.MinBy(x => x.Value);
You'd then have the pair with both the key and the value in, after just a single scan.
EDIT: As Nappy says, MinBy is also in System.Interactive in Reactive Extensions.

How to prevent double round trip with Linq and ToArray() Method

I am trying to use an Array instead of a list in my query. But I must get the count first before I can iterate through the objects returned from the database. Here is my code:
var FavArray = favorites.OrderByDescending(y => y.post_date).Skip((page - 1) * config.MaxRowsPerPage).Take(config.MaxRowsPerPage).ToArray();
int FavArrayCount = FavArray.Count(); //Is this a round trip to the database?
for (int y = 0; y < FavArrayCount; y++)
{
q = new PostType();
q.Title = FavArray[y].post_title;
q.Date = FavArray[y].post_date;
q.PostID = FavArray[y].post_id;
q.Username = FavArray[y].user_username;
q.UsernameLowered = FavArray[y].user_username.ToLower();
q.CategoryID = FavArray[y].catid;
q.CategoryName = FavArray[y].name;
q.TitleSlug = FavArray[y].post_titleslug;
}
As you can see I need the count before I start iterating and I am worried that getting the count my make a trip to the database. Is this true?
FavArray.Count() will not round trip, because you have already converted it to an array, which is no longer "LINQ-ified".
Once you call ToArray, any operations on the array that it returns will not go back to the server. (Unless you use a foreign key)
LINQ methods such as Count() that you call on the array will use regular LINQ to Objects and will be completely unaware of SQL Server.
In addition to other comments (it definitely won't round trip; it's just an array), you can just use favArray.Length.

Resources