How does LINQ implement the SingleOrDefault() method? - linq

How is the method SingleOrDefault() evaluated in LINQ? Does it use a Binary Search behind the scenes?

Better than attempting to explain in words, I thought I'd just post the exact code of implementation in the .NET Framework, retrieved using the Reflector program (and reformatted ever so slightly).
public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
switch (list.Count)
{
case 0:
return default(TSource);
case 1:
return list[0];
}
}
else
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return default(TSource);
TSource current = enumerator.Current;
if (!enumerator.MoveNext())
return current;
}
}
throw Error.MoreThanOneElement();
}
It's quite interesting to oberserve that an optimisation is made if the object is of type IList<T>, which seems quite sensible. It simply falls back to enumerating over the object otherwise if the object implements nothing more specific than IEnumerable<T>, and does so just how you'd expect.
Note that it can't use a binary search because the object doesn't necessarily represent a sorted collection. (In fact, in almost all usage cases, it won't.)

I would assume that it simply performs the query and if the result count is zero, it returns the default instance of the class. If the result count is one, it returns that instance, and if the result count is greater than one, it throws an exception.

I don't think it does any searching, it's all about getting the first element of the source [list, result set, etc].
My best guess is that it just pulls the first element. If there is no first it returns the default (null, 0, false, etc). If there is a first, it attempts to pull the second result. If there is a second result it throws an exception. Otherwise it returns the first result.

Related

Responsive asynchronous search-as-you-type in Java 8

I'm trying to implement a "search as you type" pattern in Java.
The goal of the design is that no change gets lost but at the same time, the (time consuming) search operation should be able to abort early and try with the updated pattern.
Here is what I've come up so far (Java 8 pseudocode):
AtomicReference<String> patternRef
AtomicLong modificationCount
ReentrantLock busy;
Consumer<List<ResultType>> resultConsumer;
// This is called in a background thread every time the user presses a key
void search(String pattern) {
// Update the pattern
synchronized {
patternRef.set(pattern)
modificationCount.inc()
}
try {
if (!busy.tryLock()) {
// Another search is already running, let it handle the change
return;
}
// Get local copy of the pattern and modCount
synchronized {
String patternCopy = patternRef.get();
long modCount = modificationCount.get()
}
while (true) {
// Try the search. It will return false when modificationCount changes before the search is finished
boolean success = doSearch(patternCopy, modCount)
if (success) {
// Search completed before modCount was changed again
break
}
// Try again with new pattern+modCount
synchronized {
patternCopy = patternRef.get();
modCount = modificationCount.get()
}
}
} finally {
busy.unlock();
}
}
boolean doSearch(String pattern, long modCount)
... search database ...
if (modCount != modificationCount.get()) {
return false;
}
... prepare results ...
if (modCount != modificationCount.get()) {
return false;
}
resultConsumer.accept(result); // Consumer for the UI code to do something
return modCount == modificationCount.get();
}
Did I miss some important point? A race condition or something similar?
Is there something in Java 8 which would make the code above more simple?
The fundamental problem of this code can be summarized as “trying to achieve atomicity by multiple distinct atomic constructs”. The combination of multiple atomic constructs is not atomic and trying to reestablish atomicity leads to very complicated, usually broken, and inefficient code.
In your case, doSearch’s last check modCount == modificationCount.get() happens while still holding the lock. After that, another thread (or multiple other threads) could update the search string and mod count, followed by finding the lock occupied, hence, concluding that another search is running and will take care.
But that thread doesn’t care after that last modCount == modificationCount.get() check. The caller just does if (success) { break; }, followed by the finally { busy.unlock(); } and returns.
So the answer is, yes, you have potential race conditions.
So, instead of settling on two atomic variables, synchronized blocks, and a ReentrantLock, you should use one atomic construct, e.g. a single atomic variable:
final AtomicReference<String> patternRef = new AtomicReference<>();
Consumer<List<ResultType>> resultConsumer;
// This is called in a background thread every time the user presses a key
void search(String pattern) {
if(patternRef.getAndSet(pattern) != null) return;
// Try the search. doSearch will return false when not completed
while(!doSearch(pattern) || !patternRef.compareAndSet(pattern, null))
pattern = patternRef.get();
}
boolean doSearch(String pattern) {
//... search database ...
if(pattern != (Object)patternRef.get()) {
return false;
}
//... prepare results ...
if(pattern != (Object)patternRef.get()) {
return false;
}
resultConsumer.accept(result); // Consumer for the UI code to do something
return true;
}
Here, a value of null indicates that no search is running, so if a background thread sets this to a non-null value and finds the old value to be null (in an atomic operation), it knows it has to perform the actual search. After the search, it tries to set the reference to null again, using compareAndSet with the pattern used for the search. Thus, it can only succeed if it has not changed again. Otherwise, it will fetch the new value and repeat.
These two atomic updates are already sufficient to ensure that there is only a single search operation at a time while not missing an updated search pattern. The ability of doSearch to return early when it detects a change, is just a nice to have and not required by the caller’s loop.
Note that in this example, the check within doSearch has been reduced to a reference comparison (using a cast to Object to prevent compiler warnings), to demonstrate that it can be as cheap as the int comparison of your original approach. As long as no new string has been set, the reference will be the same.
But, in fact, you could also use a string comparison, i.e. if(!pattern.equals(patternRef.get())) { return false; } without a significant performance degradation. String comparison is not (necessarily) expensive in Java. The first thing, the implementation of String’s equals does, is a reference comparison. So if the string has not changed, it will return true immediately here. Otherwise, it will check the lengths then (unlike C strings, the length is known beforehand) and return false immediately on a mismatch. So in the typical scenario of the user typing another character or pressing backspace, the lengths will differ and the comparison bail out immediately.

Is LINQ's Any method efficient?

Does the Any method in LINQ iterated over the entire collection or return true when the first successful iteration occurs?
The Any method will only iterate over the minimum number of elements necessary. As soon as it finds a matching element it will return immediately
It's roughly implemented as follows
public static bool Any<T>(this IEnumerable<T> enumerable, Func<T, bool> predicate) {
foreach (var cur in enumerable) {
if (predicate(cur)) {
return true;
}
}
return false;
}
In the worst case (none or last matching) it will visit all elements. In the best case (first matching) it will only visit 1
The latter - you can look at the code with ReSharper to verify that if you download a trial version.
As to whether Any is efficient - it's not when e.g. a Count property is available as an alternative. But it does arguably express intent well.
Any returns true as soon as it finds a successful match to the predicate, though if none exist, it will have iterated across the entire collection.

MemberExpression to MemberExpression[]

The objective is to get an array of MemberExpressions from two LambdaExpressions. The first is convertible to a MethodCallExpression that returns the instance of an object (Expression<Func<T>>). The second Lambda expression would take the result of the compiled first expression and return a nested member (Expression<Func<T,TMember>>). We can assume that the second Lambda expression will only make calls to nested properties, but may do several of these calls.
So, the signature of the method I am trying to create is :
MemberExpression[] GetMemberExpressionArray<T,TValue>(Expression<Func<T>> instanceExpression, Expression<Func<T,TValue>> nestedMemberExpression)
where nestedMemberExpression will be assumed to take an argument of the form
parent => parent.ChildProperty.GrandChildProperty
and the resulting array represents the MemberAccess from parent to ChildProperty and from the value of ChildProperty to GrandChildProperty.
I have already returned the last MemberExpression using the following extension method.
public static MemberExpression GetMemberExpression<T, TValue>(Expression<Func<T, TValue>> expression)
{
if (expression == null)
{
return null;
}
if (expression.Body is MemberExpression)
{
return (MemberExpression)expression.Body;
}
if (expression.Body is UnaryExpression)
{
var operand = ((UnaryExpression)expression.Body).Operand;
if (operand is MemberExpression)
{
return (MemberExpression)operand;
}
if (operand is MethodCallExpression)
{
return ((MethodCallExpression)operand).Object as MemberExpression;
}
}
return null;
}
Now, I know there are several ways to accomplish this. The most immediately intuitive to me would be to loop through the .Expression property to get the first expression and capture references to each MemberExpression along the way. This may be the best way to do it, but it may not. I am not extraordinarily familiar with the performance costs I get from using expressions like this. I know a MemberExpression has a MemberInfo and that reflection is supposed to hurt performance.
I've tried to search for information on expressions, but my resources have been very limited in what I've found.
I would appreciate any advice on how to accomplish this task (and this type of task, in general) with optimal performance and reliability.
I'm not sure why this has been tagged performance, but the easiest way I can think of to extract member-expressions from a tree is to subclass ExpressionVisitor. This should be much simpler than manually writing the logic to 'expand' different types of expressions and walk the tree.
You'll probably have to override the VisitMember method so that:
Each member-expression is captured.
Its children are visited.
I imagine that would look something like:
protected override Expression VisitMember(MemberExpression node)
{
_myListOfMemberExpressions.Add(node);
return base.VisitMember(node);
}
I'm slightly unclear about the remainder of your task; it appears like you want to rewrite parameter-expressions, in which case you might want to look at this answer from Marc Gravell.

How to avoid Linq chaining to return null?

I have a problem with code contracts and linq. I managed to narrow the issue to the following code sample. And now I am stuck.
public void SomeMethod()
{
var list = new List<Question>();
if (list.Take(5) == null) { }
// resharper hints that condition can never be true
if (list.ForPerson(12) == null) { }
// resharper does not hint that condition can never be true
}
public static IQueryable<Question> ForPerson(this IQueryable<Question> source, int personId)
{
if(source == null) throw new ArgumentNullException();
return from q in source
where q.PersonId == personId
select q;
}
What is wrong with my linq chain? Why doesn't resharper 'complain' when analyzing the ForPerson call?
EDIT: return type for ForPerson method changed from string to IQueryable, which I meant. (my bad)
Reshaper is correct that the result of a Take or Skip is never null - if there are no items the result is an IEnumerable<Question> which has no elements. I think to do what you want you should check Any.
var query = list.Take(5);
if (!query.Any())
{
// Code here executes only if there were no items in the list.
}
But how does this warning work? Resharper cannot know that the method never returns null from only looking at the method definition, and I assume that it does not reverse engineer the method body to determine that it never returns null. I assume therefore that it has been specially hard-coded with a rule specifying that the .NET methods Skip and Take do not return null.
When you write your own custom methods Reflector can make assumptions about your method behaviour from the interface, but your interface allows it to return null. Therefore it issues no warnings. If it analyzed the method body then it could see that null is impossible and would be able to issue a warning. But analyzing code to determine its possible behaviour is an incredibly difficult task and I doubt that Red Gate are willing to spend the money on solving this problem when they could add more useful features elsewhere with a much lower development cost.
To determine whether a boolean expression can ever return true is called the Boolean satisfiability problem and is an NP-hard problem.
You want Resharper to determine whether general method bodies can ever return null. This is a generalization of the above mentioned NP-hard problem. It's unlikely any tool will ever be able to do this correctly in 100% of cases.
if(source == null) throw new ArgumentNullException();
That's not the code contract way, do you instead mean:
Contract.Require(source != null);

Recursive Linq Function and Yielding

public static IEnumerable<UIElement> Traverse(this UIElementCollection source)
{
source.OfType<Grid>().SelectMany(v => Traverse(v.Children));
//This is the top level.
foreach (UIElement item in source)
{
yield return item;
}
}
This never returns anything recursively. I have been around the houses. The Linq chain should call back into the function/extension method but never does. The line does nothing as far as I can tell!
You are not doing anything with the result of the expression and probably the lazy evaluation is not enforced. If you really want to ignore the result of the expression, at least try adding ToArray() at the end ;) That should enforce the evaluation and recursively call your Traverse function.
Advantage of Bojan's solution (provided that's what you really want because it returns a different result than your initial one), is that the actual evaluation responsibility is shifted to the client of the Traverse method. Because in your case these are in-memory queries anyway, it is not that big of a difference, but if these were database queries there is a more significant performance penalty (count of actual database queries) for putting ToArray somewhere.
The recursive call is never executed, as you never use the result of SelectMany.
You can make this method lazy, and let the clients evaluate it when needed by
combining the result of SelectMany with the current source. Perhaps something like this would do the job (not tested):
public static IEnumerable<UIElement> Traverse(this UIElementCollection source)
{
var recursive_result = source.OfType<Grid>().SelectMany(v => Traverse(v.Children));
return recursive_result.Concat( source.Cast<UIElement>() );
}
public static IEnumerable<UIElement> Traverse(this UIElementCollection source)
{
//This is the top level.
foreach (UIElement item in source.OfType<Grid>().SelectMany(v => Traverse(v.Children)).Concat(source.Cast<UIElement>()))
{
yield return item;
}
}
This has the desired result, not sure it is optimal though!

Resources