What is does expression<T> do? - linq

What does Expression<T> do?
I have seen it used in a method similar to:
private Expression<Func<MyClass,bool>> GetFilter(...)
{
}
Can't you just return the Func<MyClass,bool> ?
Google and SO searches have failed me due to the < and > signs.

If TDelegate represents a delegate type, then Expression<TDelegate> represents a lambda expression that can be converted to a delegate of type TDelegate as an expression tree. This allows you to programatically inspect a lambda expression to extract useful information.
For example, if you have
var query = source.Where(x => x.Name == "Alan Turing");
then x => x.Name == "Alan Turning" can be inspected programatically if it's represented as an expression tree, but not so much if it's thought of as a delegate. This is particularly useful in the case of LINQ providers which will walk the expression tree to convert the lambda expression into a different representation. For example, LINQ to SQL would convert the above expression tree to
SELECT * FROM COMPUTERSCIENTIST WHERE NAME = 'Alan Turing'
It can do that because of the representation of the lambda expression as a tree whose nodes can be walked and inspected.

An Expression allows you to inspect the structure of the code inside of the delegate rather than just storing the delegate itself.
As usual, MSDN is pretty clear on the matter:
MSDN - Expression(TDelegate)

Yes, Func<> can be used in place of place of an Expression. The utility of an expression tree is that it gives remote LINQ providers such as LINQ to SQL the ability to look ahead and see what statements are required to allow the query to function. In other words, to treate code as data.
//run the debugger and float over multBy2. It will be able to tell you that it is an method, but it can't tell you what the implementation is.
Func<int, int> multBy2 = x => 2 * x;
//float over this and it will tell you what the implmentation is, the parameters, the method body and other data
System.Linq.Expressions.Expression<Func<int, int>> expression = x => 2 * x;
In the code above you can compare what data is available via the debugger. I invite you to do this. You will see that Func has very little information available. Try it again with Expressions and you will see a lot of information including the method body and parameters are visible at runtime. This is the real power of Expression Trees.

Related

Why cant nullable arrays/hashmaps be accessed using []

When I have a nullable array/list/hashmap such as
var x: ArrayList<String>? = null
I know can access the element at index 1 like so
var element = x?.get(1)
or I could do it in a unsafe way like this
var element = x!![1]
but why can't I do something like this
var element = x?[1]
what's the difference between getting elements from an array using the first example and the last example, and why is the last example not allowed?
In the first example, you're using the safe call operator ?. to call the get function with it.
In the second example, you're using the [] operator on the non-nullable return value of the x!! expression, which of course is allowed.
However, the language simply doesn't have a ?[] operator, which would be the combination of the two. The other operators offered are also don't have null-safe variants: there's no ?+ or ?&& or anything like that. This is just a design decision by the language creators. (The full list of available operators is here).
If you want to use operators, you need to call them on non-nullable expressions - only functions get the convenience of the safe call operator.
You could also define your own operator as an extension of the nullable type:
operator fun <T> List<T>?.get(index: Int) = this?.get(index)
val x: ArrayList<String>? = null
val second = x[2] // null
This would get you a neater syntax, but it hides the underlying null handling, and might confuse people who don't expect this custom extension on collections.

Why does using Linq's .Select() return IEnumerable<dynamic> even though the type is clearly defined?

I'm using Dapper to return dynamic objects and sometimes mapping them manually. Everything's working fine, but I was wondering what the laws of casting were and why the following examples hold true.
(for these examples I used 'StringBuilder' as my known type, though it is usually something like 'Product')
Example1: Why does this return an IEnumerable<dynamic> even though 'makeStringBuilder' clearly returns a StringBuilder object?
Example2: Why does this build, but 'Example1' wouldn't if it was IEnumerable<StringBuilder>?
Example3: Same question as Example2?
private void test()
{
List<dynamic> dynamicObjects = {Some list of dynamic objects};
IEnumerable<dynamic> example1 = dynamicObjects.Select(s => makeStringBuilder(s));
IEnumerable<StringBuilder> example2 = dynamicObjects.Select(s => (StringBuilder)makeStringBuilder(s));
IEnumerable<StringBuilder> example3 = dynamicObjects.Select(s => makeStringBuilder(s)).Cast<StringBuilder>();
}
private StringBuilder makeStringBuilder(dynamic s)
{
return new StringBuilder(s);
}
With the above examples, is there a recommended way of handling this? and does casting like this hurt performance? Thanks!
When you use dynamic, even as a parameter, the entire expression is handled via dynamic binding and will result in being "dynamic" at compile time (since it's based on its run-time type). This is covered in 7.2.2 of the C# spec:
However, if an expression is a dynamic expression (i.e. has the type dynamic) this indicates that any binding that it participates in should be based on its run-time type (i.e. the actual type of the object it denotes at run-time) rather than the type it has at compile-time. The binding of such an operation is therefore deferred until the time where the operation is to be executed during the running of the program. This is referred to as dynamic binding.
In your case, using the cast will safely convert this to an IEnumerable<StringBuilder>, and should have very little impact on performance. The example2 version is very slightly more efficient than the example3 version, but both have very little overhead when used in this way.
While I can't speak very well to the "why", I think you should be able to write example1 as:
IEnumerable<StringBuilder> example1 = dynamicObjects.Select<dynamic, StringBuilder>(s => makeStringBuilder(s));
You need to tell the compiler what type the projection should take, though I'm sure someone else can clarify why it can't infer the correct type. But I believe by specifying the projection type, you can avoid having to actually cast, which should yield some performance benefit.

Building a lambda with an Expression Tree

I'm having a hard time grasping the expression trees. I would like to be able to build an expression tree manually for the following statement:
c => c.Property
A lot of the tutorials focus around comparing, while I just want it to return this one property. Any help?
ParameterExpression parameter = Expression.Parameter(typeof(YourClass), "c");
Expression property = Expression.PropertyOrField(parameter, "Property");
Expression<Func<YourClass, PropertyType>> lamda = Expression.Lambda<Func<YourClass, PropertyType>>(property, parameter);

Doesn't IQueryable.Expression = Expression.Constant(this); cause an infinite loop?

Expression trees represent code in a tree-like data structure, where each node is an expression. In Linq they are used by linq providers to convert them into native language of a target process.
I know very little of expression trees, but I've been reading the following code where author uses Expression.Constant(this) to describe the initial query. Thus according to author Expression.Constant(this) should enable provider to retrieve the initial sequence of elements for someQuery.
But to my understanding this should instead cause an infinite loop, since expression tree in someQuery.Expression is describing someQuery object and not the details of the query itself ( or to put it differently, if target platform is SQL DB, Expression.Constant(this) doesn't describe in non-sql terms which rows or tables a query should retrieve from a DB ). And thus when provider looks into someQuery.Expression, it will only find description D of someQuery object.And if it further inspects the details of D.Expression property, it again finds the description of someQuery object and so on – thus infinite loop:
public class Query<T> : IQueryable<T>...
{
QueryProvider provider;
Expression expression;
public Query(QueryProvider provider) {
this.provider = provider;
this.expression = Expression.Constant(this);
}
...
}
Query<string> someQuery = new Query<string>();
thank you
I'd expect the provider to have knowledge of the query type and know that when it hit a constant expression of type Query<T>, it had hit a leaf, effectively. Sooner or later, the provider has to get to something describing "the whole table" or an equivalent. Of course the Query<T> would need the information about which table etc, but in a full example I'd expect it to have that information.
(Out of interest, am I the author in question? I wrote something very similar in C# in Depth...)

LINQ Performance

What exactly is happening behind the scenes in a LINQ query against an object collection? Is it just syntactical sugar or is there something else happening making it more of an efficient query?
Do you mean in terms of a query expression, or what the query does behind the scenes?
Query expressions are expanded into "normal" C# first. For example:
var query = from x in source
where x.Name == "Fred"
select x.Age;
is translated to:
var query = source.Where(x => x.Name == "Fred")
.Select(x => x.Age);
The exact meaning of this depends on the type of source of course... in LINQ to Objects, it typically implements IEnumerable<T> and the Enumerable extension methods come into play... but it could be a different set of extension methods. (LINQ to SQL would use the Queryable extension methods, for example.)
Now, suppose we are using LINQ to Objects... after extension method expansion, the above code becomes:
var query = Enumerable.Select(Enumerable.Where(source, x => x.Name == "Fred"),
x => x.Age);
Next the implementations of Select and Where become important. Leaving out error checking, they're something like this:
public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
foreach (T element in source)
{
if (predicate(element))
{
yield return element;
}
}
}
public static IEnumerable<TResult> Select<TSource, TResult>
(this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
{
foreach (TSource element in source)
{
yield return selector(element);
}
}
Next there's the expansion of iterator blocks into state machines, which I won't go into here but which I have an article about.
Finally, there's the conversion of lambda expressions into extra methods + appropriate delegate instance creation (or expression trees, depending on the signatures of the methods called).
So basically LINQ uses a lot of clever features of C#:
Lambda expression conversions (into delegate instances and expression trees)
Extension methods
Type inference for generic methods
Iterator blocks
Often anonymous types (for use in projections)
Often implicit typing for local variables
Query expression translation
However, the individual operations are quite simple - they don't perform indexing etc. Joins and groupings are done using hash tables, but straightforward queries like "where" are just linear. Don't forget that LINQ to Objects usually just treats the data as a forward-only readable sequence - it can't do things like a binary search.
Normally I'd expect hand-written queries to be marginally faster than LINQ to Objects as there are fewer layers of abstraction, but they'll be less readable and the performance difference usually won't be significant.
As ever for performance questions: when in doubt, measure!
If you need better performance, consider trying i4o - Index for Objects. It build in-memory objects for large collections (think 100,000+ rows), which LINQ then uses to speed up queries. You need a lot of data to make this work, but the improvements are impressive.
http://www.codeplex.com/i4o
It's just syntactic sugar - there's no magic involved.
You could write out the equivalent code in "longhand", in C# or whatever, and it would perform the same.
(The compiler will do a good job of producing efficient code, of course, so the code it produces might be a fraction more efficient than the code you would write yourself, simply because you might not know the most performant way to write that code.)

Resources