Are methods such as XContainer.Elements, XContainer.Nodes etc also considered Linq to XML query operators? - linq

Book uses different terms for Linq-to-XML methods/properties defined in classes XObject, XNode, XElement etc ( such as XContainer.Elements, XContainer.Nodes, XObject.Document ... ) and for extension methods defined in Extensions class. For former it uses the term methods, while with extension methods it uses the term query operators.
Is there a particular reason why author uses two different terms or are XContainer.Elements, XContainer.Nodes etc also considered Linq-to-XML query operators?
Thank you

Ultimately I doubt that these terms are specified anywhere in a particularly definitive way - and I wouldn't worry too much about it.
I wouldn't be surprised to see the author using the terms inconsistently themselves. I'd be even less surprised if that were the case and the author turned out to be me ;)

I'm not sure which book you are refering to, but the Elements/Nodes/etc methods are considered Axis Methods (http://msdn.microsoft.com/en-us/library/bb387099.aspx). I would think the query operators would be things like Select/Where/OrderBy regardless of whether they exist directly on the type in question or if they were static extension methods.

Related

Would you abstract your LINQ queries into extension methods

On my current project we set ourselves some goals for the code metrics "Maintainability Index" and "Cyclometic Complexity". Maintainability Index should be 60 or higher and Cyclometic Complexity 25 or less. We know that the Maintainability Index of 60 and higher is a pretty high one.
We also use a lot of linq to filter/group/select entities. I found out that these linq queries aren't scoring that high on Maintainability Index.
Abstracting this queries into extension methods is giving me a higher Maintainability Index, which is good. But in most of the cases the extension methods are not generic anymore because I use them with my Types instead of generic types.
For example the following linq-query vs extension method:
Linq query
List.Where(m => m.BeginTime >= selectionFrom && m.EndTime <= selectionTo)
Extension method:
public static IEnumerable<MyType> FilterBy(this IEnumerable<MyType> source, DateTime selectionFrom, DateTime selectionTo)
{
return (IEnumerable<MyType>)source.Where(m => m.BeginTime >= selectionFrom && m.EndTime <= selectionTo);
}
List.FilterBy(selectionFrom, selectionTo);
The extension method gives me a Maintainability Index improvement of 6 points, and gives a nice fluent syntax.
On the other hand I have to add a static class, it's not generic.
Any ideas on what approach would have your favor? Or maybe have different ideas about how to refactor the linq queries to improve Maintainability Index?
You shouldn't add classes for the sake of metrics. Any metrics are meant to make your code better but following rules blindly, even the best rules, may in fact harm your code.
I don't think it's a good idea to stick to certain Maintainability and Complexity indexes. I believe they are useful for evaluating old code, i.e. when you inherited a project and need to estimate its complexity. However, it's absurd to extract a method because you haven't scored enough points.
Only refactor if such refactoring adds value to the code. Such value is a complex human metric inexpressible in numbers, and estimating it is exactly what programming experience is about—finding balance between optimization vs readability vs clean API vs cool code vs simple code vs fast shipping vs generalization vs specification, etc.
This is the only metric you should follow but it's not always the metric everyone agrees upon...
As for your example, if the same LINQ query is used over and over, it makes perfect sense to create an EnumerableExtensions in Extensions folder and extract it there. However, if it used once or twice, or is subject to change, verbose query is so much better.
I also don't understand why you say they are not generic with somewhat negative connotations. You don't need generics everywhere! In fact, when writing extension methods, you should consider the most specific types you can choose as to not pollute other classes' method set. If you want your helper to only work with IEnumerable<MyType>, there is absolutely no shame in declaring an extension method exactly for this IEnumerable<MyType>. By the way, there's redundant casting in your example. Get rid of it.
And don't forget, tools are stupid. So are we, humans.
My advice to you would be ... don't be a slave to your metrics! They are machine generated and only intended to be used as guidance. They are never going to be a replacement for a skilled experienced programmer.
Which do you think is right for your application?
I for one agree with the extension method strategy. I've used it without a problem in a handful of real-world apps.
To me, it is not only about the metrics, but also the re-usability of the code there. See the following psuedo-examples:
var x = _repository.Customers().WhichAreGoldCustomers();
var y = _repository.Customers().WhichAreBehindInPayments();
Having those two extension methods accomplishes your goal for metrics, and it also provides "one place for the definition of what it is to be a gold customer." You don't have different queries being created in different places by different developers when they need to work with "gold customers."
Additionally, they are composable:
var z = _repository.Customers().WhichAreGoldCustomers().WhichAreBehindInPayments();
IMHO this is a winning approach.
The only problem we've faced is that there is a ReSharper bug that sometimes the Intellisense for the extension methods goes crazy. You type ".Whic" and it lets you pick the extension method you want, but when you "tab" on it, it puts something completely different into the code, not the extension method that you selected. I've considered switching from ReSharper for this, but... nah :)
NO: in this case I would ignore the cyclomatic complexity - what you had originally was better.
Ask yourself what is more explanatory. This:
List.Where(m => m.BeginTime >= selectionFrom && m.EndTime <= selectionTo)
or this:
List.FilterBy(selectionFrom, selectionTo);
The first clearly expresses what you want, whereas the second does not. The only way to know what "FilterBy" means is to go into the source code and look at its implementation.
Abstracting query fragments into extension methods makes sense with more complex scenarios, where it's not easy to judge at a glance what the query fragment is doing.
I have used this technique in places, for example a class Payment has a corresponding class PaymentLinqExtensions which provides domain specific extensions for Payments.
In the example you give I'd choose a more descriptive method name. There is also the question of whether the range is inclusive or exclusive, Otherwise it looks OK.
If you have multiple objects in your system for which the concept of having a date is common then consider an interface, maybe IHaveADate (or something better :-)
public static IQueryable<T> WithinDateRange(this IQueryable<T> source, DateTime from, DateTime to) where T:IHaveADate
(IQueryable is interesting. I don't think IEnumerable can cast to it which is a shame. If you're working with database queries then it can allow your logic to appear in the final SQL that is sent to the server which is good. There is the potential gotcha with all LINQ that your code is not executed when you expect it to be)
If date ranges are an important concept in your application, and you need to be consistent about whether the range starts at midnight on the end of "EndDate" or midnight at the start of it, then a DateRange class may be useful. Then
public static IQueryable<T> WithinDateRange(this IQueryable<T> source, DateRange range) where T:IHaveADate
You could also, if you feel like it, provide
public static IEnumerable<T> WithinDateRange(this IEnumerable<T> source, DateRange range, Func<DateTime,T> getDate)
but this to me feels more something to do with DateRange. I don't know how much it would be used, though your situation may vary. I've found that getting too generic can make things hard to understand, and LINQ can be hard to debug.
var filtered = myThingCollection.WithinDateRange(myDateRange, x => x.Date)

The "Dial-able" Power Principle (aka?)

As a designer, I like providing interfaces that cater to a power/simplicity balance. For example, I think the LINQ designers followed that principle because they offered both dot-notation and query-notation. The first is more powerful, but the second is easier to read and follow. If you disagree with my assessment of LINQ, please try to see my point anyway; LINQ was just an example, my post is not about LINQ.
I call this principle "dial-able power". But I'd like to know what other people call it. Certainly some will say "KISS" is the common term. But I see KISS as a superset, or a "consumerism" practice. Using LINQ as my example again, in my view, a team of programmers who always try to use query notation over dot-notation are practicing KISS. Thus the LINQ designers practiced "dial-able power", whereas the LINQ consumers practice KISS. The two make beautiful music together.
edit I'll give another example. Imagine a logging tool that has two signatures allowing two uses:
void Write(string message);
void Write(Func<string> messageCallback);
The purpose of the two signatures is to fulfill these needs:
//Every-day "simple" usage, nothing special.
myLogger.Write("Something Happened" + error.ToString() );
//This is performance critical, do not call ToString() if logging is
//disabled.
myLogger.Write( () => { "Something Happened" + error.ToString() });
Having these overloads represents "dial-able power," because the consumer has the choice of a simple interface or a powerful interface. A KISS-loving consumer will use the simpler signature most of the time, and will allow the "busy" looking signature when the power is needed. This also helps self-documentation, because usage of the powerful signature tells the reader that the code is performance critical. If the logger had only the powerful signature, then there would be no "dial-able power."
So this comes full-circle. I'm happy to keep my own "dial-able power" coinage if none yet exists, but I can't help think I'm missing an obvious designation for this practice.
p.s. Another example that is related, but is not the same as "dial-able power", is Scott Meyer's principle "make interfaces easy to use correctly, and hard to use incorrectly."
If your "dial" has only two positions/levels, it sounds like you're simply referring to a façade.
"Progressive disclosure."
You may already be acquainted with the term because of its use with user interfaces -- e.g., "More" buttons. However, the concept is more general.
From "Universal Principles of Design," by Lidwell, Holden and Butler:
Progressive disclosure involves separating information into
multiple layers and only presenting layers that are necessary or relevant.
I call this principle "dial-able
power". But I'd like to know what
other people call it.
I've personally never heard of "dial-able power", and I don't think its an industry standard term.
In the case of LINQ, we'd refer to its design as a fluent interface.
Fluent interfaces are design so that all methods on an object return the same type as that object, and therefore makes method chaining easy. You see the same fluent design in the StringBuilder.Append overloads, fluent NHibernate, and RhinoMocks.
In the case of JQuery, it also uses fluent interfaces for method chaining, but I believe "query" or "DSL" is the proper name for its selector notation.
(Obj-C selectors use the same terminology, but describe something completely different.)
Since its described as a querying DSL, most people can infer that it takes a sequence as input and returns a sequence as output. The query notation is performs roughly the same function as XPath with more bells and whistles.
Hibernate HQL is a querying DSL on top of many SQL dialects, and in a very superficial way regexes are a querying DSL which transform string sequences into a new set of string sequences (you can, in principle, make a fluent interface for regexes, but it would probably make you claw your eyes out).

drawbacks of linq

What are the drawbacks of linq in general.
Can be hard to understand when you first start out with it
Deferred execution can separate errors from their causes (in terms of time)
Out-of-process LINQ (e.g. LINQ to SQL) will always be a somewhat leaky abstraction - you need to know what works and what doesn't, essentially
I still love LINQ massively though :)
EDIT: Having written this short list, I remembered that I've got an answer to a very similar question...
The biggest pain with LINQ is that (with database backends) you can't use it over a repository interface without it being a leaky abstraction.
LINQ is fantastic within a layer (especially the DAL etc), but since different providers support different things, you can't rely on Expression<Func<...>> or IQueryable<T> features working the same for different implementations.
As examples, between LINQ-to-SQL and Entity Framework:
EF doesn't support Single()
EF will error if you Skip/Take/First without an explicit OrderBy
EF doesn't support UDFs
etc. The LINQ provider for ADO.NET Data Services supports different combinations. This makes mocking and other abstractions unsafe.
But: for in-memory (LINQ-to-Objects), or in a single layer/implementation... fantastic.
Some more thoughts here: Pragmatic LINQ.
Like any abstraction in programming, it is vulnerable to a misunderstanding: "If I just understand this abstraction, I don't need to understand what's happening under the covers."
The truth is, if you do understand what's happening under the covers, you'll get much better value out of the abstraction, because you'll understand where it ceases to be applicable, so you'll be able to apply it with greater confidence of success where it is appropriate.
This is true of all abstractions, and applies to Linq in bucketfuls. To understand Linq to Objects, the best thing to do is to learn how to write Select, Where, Aggregate, etc. in C# with yield return. And then figure out how yield return replaces a lot of hand-written code by writing it all with classes. Then you'll be able to use it with an appreciation of the effort it is saving you, and it will no longer seem like magic, so you'll understand the limitations.
Same for the variants of Linq where the predicates are captured as expressions and transported off to another environment to be executed. You have to understand how it works in order to safely use it.
So the number 1 drawback of Linq is: the simple examples look deceptively short and simple. The problem is, how did the author of the sample know what to write? Because they knew how to write it all out in long form, and they knew how pieces of Linq could be used as abreviations, and so they arrived at the nice short version.
As I say, not really specific to Linq, but highly relevant to it anyway.
Anonymous types. Proper ORM should always return objects of 'your' type (partial class, with possiblity of adding my methods, overriding etc.). There are doezne of tutorials and examples of different complex queries using linq but non of them care to explain the advantage of returning a 'bag of properties' (return new { .........} ). How am I supposed to work with anonymous type, wrap it in another class again?
Actually I can´t think of any drawbacks. It makes programming life a lot simpler because a lot of things can be written in a more compact but still better readable way.
But having said this, I must also agree with Jon that you should have some idea what you´re doing (but that holds for all technological advances).
the only drawback which it has is its performance see this article

A DSL for Linq Queries - looking for ideas

I am currently using a CMS which uses an ORM with its own bespoke query language (i.e. with select/where/orderby like statements). I refer to this mini-language as a DSL, but I might have the terminology wrong.
We are writing controls for this CMS, but I would prefer not to couple the controls to the CMS, because we have some doubts about whether we want to continue with this CMS in the longer term.
We can decouple our controls from the CMS fairly easily, by using our own DAL/abstraction layer or what not.
Then I remembered that on most of the CMS controls, they provide a property (which is design-time editable) where users can type in a query to control what gets populated in the data source. Nice feature - the question is how can I abstract this feature?
It then occurred to me that maybe a DSL framework existed out there that could provide me with a simple query language that could be turned into a LINQ expression at runtime. Thus decoupling me from the CMS' query DSL.
Does such a thing exist? Am I wasting my time? (probably the latter)
Thanks
this isn't going to answer your question fully, but there is an extension for LINQ that allows you to specify predicates for LINQ queries as strings called Dynamic LINQ, so if you want to store the conditions in some string-based format, you could probably build your language on top of this. You'd still need to find a way to represent different clauses (where/orderby/etc.) but for the predicates passed as arguments to these, you could use Dynamic LINQ.
Note that Dynamic LINQ allows you to parse the string, but AFAIK doesn't have any way to turn existing Expression tree into that string... so there would be some work needed to do that.
(but I'm not sure if I fully understand the question, so maybe I'm totally of :-))

How does Linq work (behind the scenes)?

I was thinking about making something like Linq for Lua, and I have a general idea how Linq works, but was wondering if there was a good article or if someone could explain how C# makes Linq possible
Note: I mean behind the scenes, like how it generates code bindings and all that, not end user syntax.
It's hard to answer the question because LINQ is so many different things. For instance, sticking to C#, the following things are involved:
Query expressions are "pre-processed" into "C# without query expressions" which is then compiled normally. The query expression part of the spec is really short - it's basically a mechanical translation which doesn't assume anything about the real meaning of the query, beyond "order by is translated into OrderBy/ThenBy/etc".
Delegates are used to represent arbitrary actions with a particular signature, as executable code.
Expression trees are used to represent the same thing, but as data (which can be examined and translated into a different form, e.g. SQL)
Lambda expressions are used to convert source code into either delegates or expression trees.
Extension methods are used by most LINQ providers to chain together static method calls. This allows a simple interface (e.g. IEnumerable<T>) to effectively gain a lot more power.
Anonymous types are used for projections - where you have some disparate collection of data, and you want bits of each of the aspects of that data, an anonymous type allows you to gather them together.
Implicitly typed local variables (var) are used primarily when working with anonymous types, to maintain a statically typed language where you may not be able to "speak" the name of the type explicitly.
Iterator blocks are usually used to implement in-process querying, e.g. for LINQ to Objects.
Type inference is used to make the whole thing a lot smoother - there are a lot of generic methods in LINQ, and without type inference it would be really painful.
Code generation is used to turn a model (e.g. DBML) into code
Partial types are used to provide extensibility to generated code
Attributes are used to provide metadata to LINQ providers
Obviously a lot of these aren't only used by LINQ, but different LINQ technologies will depend on them.
If you can give more indication of what aspects you're interested in, we may be able to provide more detail.
If you're interested in effectively implementing LINQ to Objects, you might be interested in a talk I gave at DDD in Reading a couple of weeks ago - basically implementing as much of LINQ to Objects as possible in an hour. We were far from complete by the end of it, but it should give a pretty good idea of the kind of thing you need to do (and buffering/streaming, iterator blocks, query expression translation etc). The videos aren't up yet (and I haven't put the code up for download yet) but if you're interested, drop me a mail at skeet#pobox.com and I'll let you know when they're up. (I'll probably blog about it too.)
Mono (partially?) implements LINQ, and is opensource. Maybe you could look into their implementation?
Read this article:
Learn how to create custom LINQ providers
Perhaps my LINQ for R6RS Scheme will provide some insights.
It is 100% semantically, and almost 100% syntactically the same as LINQ, with the noted exception of additional sort parameters using 'then' instead of ','.
Some rules/assumptions:
Only dealing with lists, no query providers.
Not lazy, but eager comprehension.
No static types, as Scheme does not use them.
My implementation depends on a few core procedures:
map - used for 'Select'
filter - used for 'Where'
flatten - used for 'SelectMany'
sort - a multi-key sorting procedure
groupby - for grouping constructs
The rest of the structure is all built up using a macro.
Bindings are stored in a list that is tagged with bound identifiers to ensure hygiene. The binding are extracted and rebound locally where ever an expression occurs.
I did track the progress on my blog, that may provide some insight to possible issues.
For design ideas, take a look at c omega, the research project that birthed Linq. Linq is a more pragmatic or watered down version of c omega, depending on your perspective.
Matt Warren's blog has all the answers (and a sample IQueryable provider implementation to give you a headstart):
http://blogs.msdn.com/mattwar/

Resources