How to avoid Linq chaining to return null? - linq

I have a problem with code contracts and linq. I managed to narrow the issue to the following code sample. And now I am stuck.
public void SomeMethod()
{
var list = new List<Question>();
if (list.Take(5) == null) { }
// resharper hints that condition can never be true
if (list.ForPerson(12) == null) { }
// resharper does not hint that condition can never be true
}
public static IQueryable<Question> ForPerson(this IQueryable<Question> source, int personId)
{
if(source == null) throw new ArgumentNullException();
return from q in source
where q.PersonId == personId
select q;
}
What is wrong with my linq chain? Why doesn't resharper 'complain' when analyzing the ForPerson call?
EDIT: return type for ForPerson method changed from string to IQueryable, which I meant. (my bad)

Reshaper is correct that the result of a Take or Skip is never null - if there are no items the result is an IEnumerable<Question> which has no elements. I think to do what you want you should check Any.
var query = list.Take(5);
if (!query.Any())
{
// Code here executes only if there were no items in the list.
}
But how does this warning work? Resharper cannot know that the method never returns null from only looking at the method definition, and I assume that it does not reverse engineer the method body to determine that it never returns null. I assume therefore that it has been specially hard-coded with a rule specifying that the .NET methods Skip and Take do not return null.
When you write your own custom methods Reflector can make assumptions about your method behaviour from the interface, but your interface allows it to return null. Therefore it issues no warnings. If it analyzed the method body then it could see that null is impossible and would be able to issue a warning. But analyzing code to determine its possible behaviour is an incredibly difficult task and I doubt that Red Gate are willing to spend the money on solving this problem when they could add more useful features elsewhere with a much lower development cost.
To determine whether a boolean expression can ever return true is called the Boolean satisfiability problem and is an NP-hard problem.
You want Resharper to determine whether general method bodies can ever return null. This is a generalization of the above mentioned NP-hard problem. It's unlikely any tool will ever be able to do this correctly in 100% of cases.

if(source == null) throw new ArgumentNullException();
That's not the code contract way, do you instead mean:
Contract.Require(source != null);

Related

Byte vs boolean for OrderBy

Is there any performance benefit to using a byte over a bool in ordering?
For example, given some code:
var foo = items.OrderByDescending(item => item.SomeProperty);
The existing code to get the value of SomeProperty is:
public byte SomeProperty
{
get
{
if (a == b)
return 1;
else
return 0;
}
}
I wanted to refactor this to:
public bool SomeProperty
{
get
{
a == b
}
}
I was told the first is more efficient. Is this true? Are there any downsides to using a bool over a byte?
The efficiency will hardly be in the processing efficiency. It will be more in efficiency of development code: is the code easy to understand? easy to reuse for similar items? easy to change if the internal structure changes without changing the interface? easy to test?
When designing a property your first question should be: what does my property stand for? What does it mean? Does it have an identifier and type that users will expect, or will they have to look it up in the documentation because they have no idea what it means?
For instance, if you have a class that represents something persistable, like a file, and you invent a property, which one will be easier to understand:
class Persistable
{
public int IsPersisted {get;}
public bool IsPersisted {get;}
...
Which one will readers immediately know what it means?
So for now your idea about persisted can have two values meaning "not persisted yet" and "persisted". A boolean will be enough. But if you foresee that in the near future the idea about persistence will change, for instance, the persistable can be "not persisted yet" "persisted" "changed after it has been persisted" "deleted". If you foresee that, you have to decide whether it is best to return a bool. Maybe your should return an enum:
public PersistencyState State {get;}
Conclusion Design the identifiers and types of your properties and methods such that the learning curve for your users is low, and that foreseeable changes don't have a great impact. Make sure that the properties are easy to test and maintain. In rare occasions portability is an issue.
Those items have bigger influence on your efficiency than the two code changes.
Back to your question
If you think about what SomeProperty represents, and you think: it represents the equality of a and b, then you should use:
public bool EqualAB => a == b
If your question is about whether you should use "get" or =>, the first one will call something sub-routine like, while the 2nd method will insert the code. If the part after the => is fairly big, and you use it on hundreds of locations, then your code will become bigger.
But then again: if your get is really big, should you make it a property?
public string ElderName
{
get
{
myDataBase.Open()
var allCustomers = myDataBase.FetchAllCustomers().ToList();
var eldestCustomer = this.FindEldestCustomer(allCustomers);
return eldestCustomer.Name;
}
}
Well this will have a fair impact on code size if you use the => notation on 1000 locations. But honestly, designers that put this in a property instead of a method don't deserve efficient code.
Finally, I asked here in stackoverflow whether there is a difference:
string Name {get => this.name;}
string Name => this.name;
The answer was that it translated into the same assembly code

Spring Data + QueryDSL empty predicate + Predicate chaining

let me get straight to the point.
I am using Spring Data JPA with QueryDSL in a project and I cannot figure out this myself.
I have the QueryDSL predicates in static methods that can take arguments and if the argument is not correct it should return "empty predicate" :
public static BooleanExpression byWhateverId(Long whateverId) {
if(whateverId == null) return [insert magic here];
// if parameter is OK return usual predicate
return QClass.property.whateverId.eq(whateverId);
}
Now I want to be able to chain these predicates using AND/OR oprators :
someRepository.findAll(byWhateverId(someParam).and(bySomethingElseId(1));
The problem here is that at this point I don't know whether 'someParam' is null or not (of course I can check but that's a lot of IFs).
I also know I can use BooleanBuilder class but that seems also like a lot of code that should not be needed.
Does anybody knows what could be inserted instead of "[insert magic here]" ???
Or maybe I am missing something somewhere...
Thanks!
You can return null for non matching predicates in byWhateverId and bySomethingElseId and combine the predicate via ExpressionUtils.allOf()
In your case
Predicate where = ExpressionUtils.allOf(byWhateverId(someParam), bySomethingElseId(1));
someRepository.findAll(where);
4 years old question, but anyway...
You can return sql predicate which is always true, like true=true:
public static BooleanExpression alwaysTrue() {
return Expressions.TRUE.isTrue;
}
If you have a bunch of these the generated sql won't be super nice, so you might want to limit such usages to a minimum.
Sorry, I completely forgot about this.
The right solution (from my point of view) is to use BooleanBuilder.

MemberExpression to MemberExpression[]

The objective is to get an array of MemberExpressions from two LambdaExpressions. The first is convertible to a MethodCallExpression that returns the instance of an object (Expression<Func<T>>). The second Lambda expression would take the result of the compiled first expression and return a nested member (Expression<Func<T,TMember>>). We can assume that the second Lambda expression will only make calls to nested properties, but may do several of these calls.
So, the signature of the method I am trying to create is :
MemberExpression[] GetMemberExpressionArray<T,TValue>(Expression<Func<T>> instanceExpression, Expression<Func<T,TValue>> nestedMemberExpression)
where nestedMemberExpression will be assumed to take an argument of the form
parent => parent.ChildProperty.GrandChildProperty
and the resulting array represents the MemberAccess from parent to ChildProperty and from the value of ChildProperty to GrandChildProperty.
I have already returned the last MemberExpression using the following extension method.
public static MemberExpression GetMemberExpression<T, TValue>(Expression<Func<T, TValue>> expression)
{
if (expression == null)
{
return null;
}
if (expression.Body is MemberExpression)
{
return (MemberExpression)expression.Body;
}
if (expression.Body is UnaryExpression)
{
var operand = ((UnaryExpression)expression.Body).Operand;
if (operand is MemberExpression)
{
return (MemberExpression)operand;
}
if (operand is MethodCallExpression)
{
return ((MethodCallExpression)operand).Object as MemberExpression;
}
}
return null;
}
Now, I know there are several ways to accomplish this. The most immediately intuitive to me would be to loop through the .Expression property to get the first expression and capture references to each MemberExpression along the way. This may be the best way to do it, but it may not. I am not extraordinarily familiar with the performance costs I get from using expressions like this. I know a MemberExpression has a MemberInfo and that reflection is supposed to hurt performance.
I've tried to search for information on expressions, but my resources have been very limited in what I've found.
I would appreciate any advice on how to accomplish this task (and this type of task, in general) with optimal performance and reliability.
I'm not sure why this has been tagged performance, but the easiest way I can think of to extract member-expressions from a tree is to subclass ExpressionVisitor. This should be much simpler than manually writing the logic to 'expand' different types of expressions and walk the tree.
You'll probably have to override the VisitMember method so that:
Each member-expression is captured.
Its children are visited.
I imagine that would look something like:
protected override Expression VisitMember(MemberExpression node)
{
_myListOfMemberExpressions.Add(node);
return base.VisitMember(node);
}
I'm slightly unclear about the remainder of your task; it appears like you want to rewrite parameter-expressions, in which case you might want to look at this answer from Marc Gravell.

How does LINQ implement the SingleOrDefault() method?

How is the method SingleOrDefault() evaluated in LINQ? Does it use a Binary Search behind the scenes?
Better than attempting to explain in words, I thought I'd just post the exact code of implementation in the .NET Framework, retrieved using the Reflector program (and reformatted ever so slightly).
public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
switch (list.Count)
{
case 0:
return default(TSource);
case 1:
return list[0];
}
}
else
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return default(TSource);
TSource current = enumerator.Current;
if (!enumerator.MoveNext())
return current;
}
}
throw Error.MoreThanOneElement();
}
It's quite interesting to oberserve that an optimisation is made if the object is of type IList<T>, which seems quite sensible. It simply falls back to enumerating over the object otherwise if the object implements nothing more specific than IEnumerable<T>, and does so just how you'd expect.
Note that it can't use a binary search because the object doesn't necessarily represent a sorted collection. (In fact, in almost all usage cases, it won't.)
I would assume that it simply performs the query and if the result count is zero, it returns the default instance of the class. If the result count is one, it returns that instance, and if the result count is greater than one, it throws an exception.
I don't think it does any searching, it's all about getting the first element of the source [list, result set, etc].
My best guess is that it just pulls the first element. If there is no first it returns the default (null, 0, false, etc). If there is a first, it attempts to pull the second result. If there is a second result it throws an exception. Otherwise it returns the first result.

Which syntax is better for return value?

I've been doing a massive code review and one pattern I notice all over the place is this:
public bool MethodName()
{
bool returnValue = false;
if (expression)
{
// do something
returnValue = MethodCall();
}
else
{
// do something else
returnValue = Expression;
}
return returnValue;
}
This is not how I would have done this I would have just returned the value when I knew what it was. which of these two patterns is more correct?
I stress that the logic always seems to be structured such that the return value is assigned in one plave only and no code is executed after it's assigned.
A lot of people recommend having only one exit point from your methods. The pattern you describe above follows that recommendation.
The main gist of that recommendation is that if ou have to cleanup some memory or state before returning from the method, it's better to have that code in one place only. having multiple exit points leads to either duplication of cleanup code or potential problems due to missing cleanup code at one or more of the exit points.
Of course, if your method is couple of lines long, or doesn't need any cleanup, you could have multiple returns.
I would have used ternary, to reduce control structures...
return expression ? MethodCall() : Expression;
I suspect I will be in the minority but I like the style presented in the example. It is easy to add a log statement and set a breakpoint, IMO. Plus, when used in a consistent way, it seems easier to "pattern match" than having multiple returns.
I'm not sure there is a "correct" answer on this, however.
Some learning institutes and books advocate the single return practice.
Whether it's better or not is subjective.
That looks like a part of a bad OOP design. Perhaps it should be refactored on the higher level than inside of a single method.
Otherwise, I prefer using a ternary operator, like this:
return expression ? MethodCall() : Expression;
It is shorter and more readable.
Return from a method right away in any of these situations:
You've found a boundary condition and need to return a unique or sentinel value: if (node.next = null) return NO_VALUE_FOUND;
A required value/state is false, so the rest of the method does not apply (aka a guard clause). E.g.: if (listeners == null) return null;
The method's purpose is to find and return a specific value, e.g.: if (nodes[i].value == searchValue) return i;
You're in a clause which returns a unique value from the method not used elsewhere in the method: if (userNameFromDb.equals(SUPER_USER)) return getSuperUserAccount();
Otherwise, it is useful to have only one return statement so that it's easier to add debug logging, resource cleanup and follow the logic. I try to handle all the above 4 cases first, if they apply, then declare a variable named result(s) as late as possible and assign values to that as needed.
They both accomplish the same task. Some say that a method should only have one entry and one exit point.
I use this, too. The idea is that resources can be freed in the normal flow of the program. If you jump out of a method at 20 different places, and you need to call cleanUp() before, you'll have to add yet another cleanup method 20 times (or refactor everything)
I guess that the coder has taken the design of defining an object toReturn at the top of the method (e.g., List<Foo> toReturn = new ArrayList<Foo>();) and then populating it during the method call, and somehow decided to apply it to a boolean return type, which is odd.
Could also be a side effect of a coding standard that states that you can't return in the middle of a method body, only at the end.
Even if no code is executed after the return value is assigned now it does not mean that some code will not have to be added later.
It's not the smallest piece of code which could be used but it is very refactoring-friendly.
Delphi forces this pattern by automatically creating a variable called "Result" which will be of the function's return type. Whatever "Result" is when the function exits, is your return value. So there's no "return" keyword at all.
function MethodName : boolean;
begin
Result := False;
if Expression then begin
//do something
Result := MethodCall;
end
else begin
//do something else
Result := Expression;
end;
//possibly more code
end;
The pattern used is verbose - but it's also easier to debug if you want to know the return value without opening the Registers window and checking EAX.

Resources