minimum value in dictionary using linq - linq

I have a dictionary of type
Dictionary<DateTime,double> dictionary
How can I retrive a minimum value and key coresponding to this value from this dictionary using linq ?

var min = dictionary.OrderBy(kvp => kvp.Value).First();
var minKey = min.Key;
var minValue = min.Value;
This is not very efficient though; you might want to consider MoreLinq's MinBy extension method.
If you are performing this query very often, you might want to consider a different data-structure.

Aggregate
var minPair = dictionary.Aggregate((p1, p2) => (p1.Value < p2.Value) ? p1 : p2);
Using the mighty Aggregate method.
I know that MinBy is cleaner in this case, but with Aggregate you have more power and its built-in. ;)

Dictionary<DateTime, double> dictionary;
//...
double min = dictionary.Min(x => x.Value);
var minMatchingKVPs = dictionary.Where(x => x.Value == min);
You could combine it of course if you really felt like doing it on one line, but I think the above is easier to read.
var minMatchingKVPs = dictionary.Where(x => x.Value == dictionary.Min(y => y.Value));

You can't easily do this efficiently in normal LINQ - you can get the minimal value easily, but finding the key requires another scan through. If you can afford that, use Jess's answer.
However, you might want to have a look at MinBy in MoreLINQ which would let you write:
var pair = dictionary.MinBy(x => x.Value);
You'd then have the pair with both the key and the value in, after just a single scan.
EDIT: As Nappy says, MinBy is also in System.Interactive in Reactive Extensions.

Related

LINQ: Improving performance of "query to find all dictionaries from list of dictionaries where given key has at least one value from list of values"

I tried searching for existing questions, but I could not find anything, so apologize if this is duplicate question.
I have following piece of code. This code runs in a loop for different values of key and listOfValues (listOfDict does not change and built only once, key and listOfValues vary for each iteration). This code currently works, but profiler shows that 50% of the execution time is spent in this LINQ query. Can I improve performance - using different LINQ construct perhaps?
// List of dictionary that allows multiple values against one key.
List<Dictionary<string, List<string>>> listOfDict = BuildListOfDict();
// Following code & LINQ query runs in a loop.
List<string> listOfValues = BuildListOfValues();
string key = GetKey();
// LINQ query to find all dictionaries from listOfDict
// where given key has at least one value from listOfValues.
List<Dictionary<string, List<string>>> result = listOfDict
.Where(dict => dict[key]
.Any(lhs => listOfValues.Any(rhs => lhs == rhs)))
.ToList();
Using HashSet will perform significantly better. You can create a HashSet<string> like so:
IEnumerable<string> strings = ...;
var hashSet = new HashSet<string>(strings);
I assume you can change your methods to return HashSets and make them run like this:
List<Dictionary<string, HashSet<string>>> listOfDict = BuildListOfDict();
HashSet<string> listOfValues = BuildListOfValues();
string key = GetKey();
List<Dictionary<string, HashSet<string>>> result = listOfDict
.Where(dict => listOfValues.Overlaps(dict[key]))
.ToList();
Here HashSet's instance method Overlaps is used. HashSet is optimized for set operations like this. In a test using one dictionary of 200 elements this runs in 3% of the time compared to your method.
UPDATED: Per #GertArnold, switched from Any/Contains to HashSet.Overlaps for slight performance improvement.
Depending on whether listOfValues or the average value for a key is longer, you can either convert listOfValues to a HashSet<string> or build your list of dictionaries to have a HashSet<string> for each value:
// optimize testing against listOfValues
var valHS = listOfValues.ToHashSet();
var result2 = listOfDict.Where(dict => valHS.Overlaps(dict[key]))
.ToList();
// change structure to optimize query
var listOfDict2 = listOfDict.Select(dict => dict.ToDictionary(kvp => kvp.Key, kvp => kvp.Value.ToHashSet())).ToList();
var result3 = listOfDict2.Where(dict => dict[key].Overlaps(listOfValues))
.ToList();
Note: if the query is repeated with differing listOfValues, it probably makes more sense to build the HashSet in the dictionaries once, rather than computing a HashSet from each listOfValues.
#LasseVågsætherKarlsen suggestion in comments to invert the structure intrigued me, so with a further refinement to handle the multiple keys, I created an index structure and tested lookups. With my Test Harness, this is about twice as fast as using a HashSet for one of the List<string>s and four times faster than the original method:
var listOfKeys = listOfDict.First().Select(d => d.Key);
var lookup = listOfKeys.ToDictionary(k => k, k => listOfDict.SelectMany(d => d[k].Select(v => (v, d))).ToLookup(vd => vd.v, vd => vd.d));
Now to filter for a particular key and list of values:
var result4 = listOfValues.SelectMany(v => lookup[key][v]).Distinct().ToList();

Turn 2 linq expressions into

Is there anyway to change the following 2 linq expressions into 1?
var criticalCategories =
_commonDao.GetAllByExpression<CategoryItem>(
x => x.Category.Uid == gridAnswer.ActivityCategory.Uid && x.Critical);
if(criticalCategories.Any())
{
criticalWeight = criticalCategories.Min(x => x.Weight);
}
You can use Enumerable.DefaultIfEmpty to make sure that Min will produce a specific value if your source sequence contains no elements.
You could then write:
var criticalCategories = _commonDao.GetAllByExpression<CategoryItem>(...);
criticalWeight = criticalCategories
.Select(x => x.Weight)
.DefaultIfEmpty(42)
.Min();
The above is trivially chainable, but I did not actually chain it here because I 'm not quite sure how criticalCategories is supposed to be used later on (if at all). Could you please clarify?

Truncating a collection using Linq query

I want to extract part of a collection to another collection.
I can easily do the same using a for loop, but my linq query is not working for the same.
I am a neophyte in Linq, so please help me correcting the query (if possible with explanation / beginners tutorial link)
Legacy way of doing :
Collection<string> testColl1 = new Collection<string> {"t1", "t2", "t3", "t4"};
Collection<string> testColl2 = new Collection<string>();
for (int i = 0; i < newLength; i++)
{
testColl2.Add(testColl1[i]);
}
Where testColl1 is the source & testColl2 is the desired truncated collection of count = newLength.
I have used the following linq queries, but none of them are working ...
var result = from t in testColl1 where t.Count() <= newLength select t;
var res = testColl1.Where(t => t.Count() <= newLength);
Use Enumerable.Take:
var testColl2 = testColl1.Take(newLength).ToList();
Note that there's a semantic difference between your for loop and the version using Take. The for loop will throw with IndexOutOfRangeException exception if there are less than newLength items in testColl1, whereas the Take version will silently ignore this fact and just return as many items up to newLength items.
The correct way is by using Take:
var result = testColl1.Take(newLength);
An equivalent way using Where is:
var result = testColl1.Where((i, item) => i < newLength);
These expressions will produce an IEnumerable, so you might also want to attach a .ToList() or .ToArray() at the end.
Both ways return one less item than your original implementation does because it is more natural (e.g. if newLength == 0 no items should be returned).
You could convert to for loop to something like this:
testColl1.Take(newLength)
Use Take:
var result = testColl1.Take(newLength);
This extension method returns the first N elements from the collection where N is the parameter you pass, in this case newLength.

TableServiceContext and dynamic query

I m trying to do something that look very simple but I hit massive difficulties when I want to make that more dynamic.
Expression<Func<TableServiceEntity, bool>> predicate = (e) => e.PartitionKey == "model" && (e.RowKey == "home" || e.RowKey == "shared");
context.CreateQuery<TableServiceEntity>(tableName).Where(predicate);
I would like to pass an array of rowKey instead of having to hard code the predicate.
When I try to build an expression tree I receive a not supported exception I think it doesn't support invoking as part of the expression tree.
Does someone know how to build and expression tree exactly as the predicate to avoid the not supported exception?
Thank you by advance
So, you can build the query dynamically by using something like this (taken from PhluffyFotos sample):
Expression<Func<PhotoTagRow, bool>> search = null;
foreach (var tag in tags)
{
var id = tag.Trim().ToLowerInvariant();
if (String.IsNullOrEmpty(id))
{
continue;
}
Expression<Func<PhotoTagRow, bool>> addendum = t => t.PartitionKey == id;
if (search == null)
{
search = addendum;
}
else
{
search = Expression.Lambda<Func<PhotoTagRow, bool>>(Expression.OrElse(search.Body, addendum.Body), search.Parameters);
}
}
Now, once you have 'search' you can just pass that as the predicate in your Where clause.
However, I want to convince you not to do this. I am answering your question, but telling you that it is a bad idea to do a multiple '|' OR clause in Table storage. The reason is that today at least, these queries cannot be optimized and they cause a full table scan. The performance will be horrendous with any non-trivial amount of data. Furthermore, if you build your predicates dynamically like this you run the risk of blowing the URL limit (keep that in mind).
This code in PhluffyFotos shows how, but it is actually a bad practice (I know, I wrote it). It really should be optimized to run each OR clause separately in parallel. That is how you really should do it. AND clauses are ok, but OR clauses should be parallelized (use PLINQ or TPL) and you should aggregate the results. It will be much faster.
HTH.
I believe what HTH said about this kind of query doing a full table scan is incorrect from the documentation I have read. Azure will perform a PARTITION scan rather than a TABLE scan which is a big difference in performance.
Here is my solution please read also the answer from HTH who pointed out that this is not a best practice.
var parameter = Expression.Parameter(typeof(TableServiceEntity), "e");
var getPartitionKey = typeof(TableServiceEntity).GetProperty("PartitionKey").GetGetMethod();
var getRowKey = typeof(TableServiceEntity).GetProperty("RowKey").GetGetMethod();
var getPartition = Expression.Property(parameter, getPartitionKey);
var getRow = Expression.Property(parameter, getRowKey);
var constPartition = Expression.Constant("model", typeof(string));
var constRow1 = Expression.Constant("home", typeof(string));
var constRow2 = Expression.Constant("shared", typeof(string));
var equalPartition = Expression.Equal(getPartition, constPartition);
var equalRow1 = Expression.Equal(getRow, constRow1);
var equalRow2 = Expression.Equal(getRow, constRow2);
var and = Expression.AndAlso(equalPartition, Expression.OrElse(equalRow1, equalRow2));
return Expression.Lambda<Func<TableServiceEntity, bool>>(and, parameter);

improving readiblity and be slightly less explicit using Linq

All,
I've got the following code, and looking into ways to improve its readibility (and remove the null check) by using Linq.
var activePlan = CurrentPlans.First();
var activeObjectives = activePlan != null ? activePlan.Objectives : null;
The closest I get is the following:
var activeObjectives = CurrentPlans.Take(1).Select(x => x.Objectives);
which gives me a collection of x.Objectives instead of Objectives.
Any ideas?
I'd write if like this:
var activeObjectives = CurrentPlans.Select(x => x.Objectives).FirstOrDefault();
This way, it's easier to work out the intention by the use of methods. Take the first set of objectives, otherwise the default (null assuming Objectives refers to a reference type). Using SelectMany() for this case isn't the best choice IMO.
oh got it:
var activeObjectives = CurrentPlans.Take(1).SelectMany(x => x.Objectives)
http://msdn.microsoft.com/en-us/library/system.linq.enumerable.selectmany.aspx

Resources