Is LINQ to Everything a good abstraction? - linq

There is a proliferation of new LINQ providers. It is really quite astonishing and an elegant combination of lambda expressions, anonymous types and generics with some syntax sugar on top to make it easy reading. Everything is LINQed now from SQL to web services like Amazon to streaming sensor data to parallel processing. It seems like someone is creating an IQueryable<T> for everything but these data sources can have radically different performance, latency, availability and reliability characteristics.
It gives me a little pause that LINQ makes those performance details transparent to the developer. Is LINQ a solid general purpose abstraction or a RAD tool or both?

To me, LINQ is just a way to make code more readable, and hence more maintainable. LINQ does nothing more than takes standard methods and integrates them into the language (hence the name - language integrated query).
It's nothing but a syntax element around normal interfaces and methods - there is no "magic" here, and LINQ-to-something really should (IMO) be treated as any other 3rd party API - you need to understand the cost/benefits of using it just like any other technology.
That being said, it's a very nice syntax helper - it does a lot for making code cleaner, simpler, and more maintainable, and I believe that's where it's true strengths lie.

I see this as similar to the model of multiple storage engines in an RDBMS accepting a common(-ish) language of SQL, in it's design ... but with the added benefit of integreation into the application language semantics. Of course it is good!
I have not used it that much, but it looks sensible and clear when performance and layers of abstraction are not in a position to have a negative impact on the development process (and trust that standards and models wont change wildly).
It is just an interface and implementation that may fit your needs, like all interfaces, abstractions, libraries and implementations, does it fit?... it is all the same answers.

I suppose - no.
LINQ is just a convenient syntax, but not a common RAD tool. In the big projects with complex logic I noticed that developers do more errors in LINQ that in the same instructions they could do if they write the same thing in .NET 2.0 manner. The code is produced faster, it is smaller, but it is harder to find bugs. Sometimes it is not obvious from the first look, at what point the queried collection turns from IQueryable into IEnumerable... I would say that LINQ requires more skilled and disciplined developers.
Also SQL-like syntax is OK for a functional programming but it is a sidestep from object oriented thinking. Sometimes when you see 2 very similar LINQ queries, they look like copy-paste code, but not always any refactoring is possible (or it is possible only by sacrificing some performance).
I heard that MS is not going to further develop LINQ to SQL, and will give more priority to Entities. Is the ADO.NET Team Abandoning LINQ to SQL? Isn't this fact a signal for us that LINQ is not a panacea for everybody ?
If you are thinking about to build a connector to "something", you can build it without LINQ and, if you like, provide LINQ as an additional optional wrapper around it, like LINQ to Entities. So your customers will decide, whether to use LINQ or not, depending on their needs, required performance etc.
p.s.
.NET 4.0 will come with dynamics, and I expect that everybody will also start to use them as LINQ... without taking into considerations that code simplicity, quality and performance may suffer.

Related

Linq, entity framework and their usage

I had to develop system for my university which includes tracking of almost all data (lectures, lecturers, teaching assistants, students, etc..)
My database have like 30 tables, and it's pretty complex.
I used EF and linq to solve connecting to database and querying from it.
But the more I'm going into, the more my queries become to hard to write, and not to mention to maintain.
Here is the example of one query: http://pastebin.com/Za1cYMPa
It is pretty much in chaos as you can see.
So, am I misusing linq (linq can solve this but on different way) or linq just isn't for more complex systems like this one?
This is general question, not one to solve particular problem.
Do you think that query will look better or be better maintainable if written in native SQL? In such case you can hide the query in stored procedure. Once you came into advanced queries it always mess. You can reduce some complexities by hiding subqueries either into database views or into EF query views. But still if you have highly normalized OLTP database and you need to do complex reporting / analysis / data mining query it will always be big and badly maintainable. That is the reason why OLAP systems exist (I didn't check content of your query - just length so don't take it as the reason to build OLAP Cube).
More important is performance of the query ...
In general, Complexity can be reduced by abstracting your repetitive code into reusable, easily-maintainable components.
In your particular case, the complexity can be reduced by implementing the Repository and Specification patterns to make your code more consistent, easier to read, and easier to maintain.
There is a very helpful sequence of articles by Huy Nhuyen explaining the repository pattern, the specification pattern and how to effectively combine them with the Entity Framework for better code. These are available at:
Entity Framework 4 POCO, Repository and Specification Pattern
Specification Pattern In Entity Framework 4 Revisited
Entity Framework 4 POCO, Repository and Specification Pattern [Upgraded to CTP5]
Entity Framework 4 POCO, Repository and Specification Pattern [Upgraded to EF 4.1]
Ooof, that is pretty nasty.
You should think about your database design. I believe having so many tables is causing your queries to be overly complicated. I well placed set of views (or stored procedures) can simplify the querying logic. However this will not simplify your overall system because at some point you have to connect all these tables.

Will usage of LINQ increase day by day or is it that some organizations do not like to use it?

Will usage of LINQ increase day by day or is it that some organizations do not like to use it?
Linq allows you to simplify your code which is always good, as it makes code less fragile and easier to maintain - as long as your intent (as the developer) is clear.
In my experience projects are only light on Linq usage if the development team don't understand the technology fully, or feel that it doesn't fit into their naive views on proper 'OO' architectures and patterns.
This is highly objective and depends on context, but I would say absolutely. If you've built medium sized application both with and without an ORM you will quickly understand the immense benefits LINQ affords. It's hard to imagine an organization building subsequent applications without an ORM in conjunction with LINQ.

Most powerful and unexpected benefit of Linq in .NET OOP/D?

Since learning about Linq and gaining experience in it, I find myself leaning on it more and more. It’s changing how I think about my classes. Some of these changes were expected (ex. Using collections more) but often unexpected (ex. Getting initial data for a class as an XElement and sometimes just keeping it there, processing it lazily.)
What is the most powerful and unexpected benefit of Linq to .NET OOP/D? I am thinking of Linq-to-objects and Linq-to-xml in particular, but include Linq-to-Entities/SQL too in so far as it has changed your class strategy.
I've noticed a couple of significant benefits from using LINQ:
Maintainability - it's much easier to understand what code does when you read a semantic transformation using LINQ, rather than some confusing looping constructs hand-written by a developer.
Performance - Because of LINQ's deferred and streaming execution, you often end up with code that performs better - either by distributing the workload, or allowing unnecessary transformations to be avoided (particularly when only a subset of results are consumed). In the future, as multicore processing becomes more significant, I expect that many LINQ methods may evolve to support native parallel processing (think multi-core sort) - which should help keep .NET applications scalable in the multi-code future.
There are a couple of other nice benefits:
Awareness of Iterator Generators: Once developers learn about LINQ, some of them go on to learn about how it works. This helps to generate awareness of the yield return syntax in C# - which is a powerful way of writing concise and correct sequence iterators.
Focus on business problems: LINQ frees developers to focus on solving the underlying business problems, rather than trying to optimize loops and algorithms to run in the fewest cycles, or use the least number of lines of code. This goes beyond just the productivity of having a library of powerful sequence transformation routines.
I feel the code is easier to maintain and easier to Test compared to have a solution in SQL stored procedures.
Combining LINQ with extensions I get something like (should maybe use some kind of Fluent Interface.....)
return source.Growth().ShareOfChangeDate();
where Growth and ShareOfChageDate are extensions that I easily can do unit tests on
and as LBushkin says the line above I can present for the customer when we discuss
Issues
I feel i get less controll on the SQL generated and it is a littlebit magic to find performance problems.....

What's the current state of ORMs?

Historically I've been completely against using ORMS for all but the most basics applications.
My reasoning has and always has been that it's a very leaky abstraction ... mostly because SQL provides a very powerful way to retreive data from a relational source which usually gets messed up by the ORM so that you lost a lot of performance to gain an appearance of not having a relational backend.
I've always thought the DATA should always be kept in the Data Base, not eat up application memory which won't scale anyway. In addition the performance hit of being to generic is harmful. For example, if I need the name and address of all the clients of my database SQL provides me with an easy way to get it, in one query. With an ORM I need to get all the clients and then each name and address, even if it's lazy loaded it's gonna take a LOT longer.
That's what I think but has any of the above changed? I'm seeing a lot of ORMS like the Entity Framework, NHibernate, etc. And they seem to have a lot of popularity lately... Are they worth it? Do they solve the problems I describe above??
Please read: All Abstractions Are Failed Abstractions It should put a lot of your questions in perspective.
Performance is usually not an issue with ORM - and if you really find yourself in a situation where it is, then there usually is always the option to handcraft the SQL statements the ORM uses.
IMHO ORM give you an instant and huge development speed increase. That's why they are so popular. And using them right does not make you paint yourself in a corner. There is always the option of hand tuning the performance.
Edit:
Even though Jeff focuses on Linq to SQL all he says about abstractions and performance are equally true for NHibernate (which I know from years of real world app development). IMHO one should use by default an ORM since they are more than fast enough for the notorious 90% of situations. Reading code written for an ORM usually is more maintainable and readable especially when your code is picked up by the next developer that inherits your code. Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live. Never forget about that guy!
In addition they give out of the box caching, lazy loading, unit of work, ... you name it. And I found that when I was not happy about the performance of the ORM it was MY fault. ORM do force you to adhere to good OO design practices and help you shape your Domain Model.
On the Ruby on Rails side, ActiveRecord -- essentially an ORM -- is the basis of 95% of Rails applications (made-up statistic, but it's around there). Actually, to get to that 95% we would probably need to include other ORMs for Rails, like DataMapper.
The abstraction is leaky, and a developer can always dip down to SQL as necessary. Even when you're not using SQL directly, you have to think about number of database hits, etc. For instance, in ActiveRecord, "eager loading" is used to avoid multiple database hits, so you see stuff like this (includes the related "author" field of each Post in the initial query... it does a join under the hood, I think)
for post in Post.find(:all, :include => :author)
The point is that the abstraction leaks as do all abstractions, but that's not really the point. To decide whether to use the abstraction or not, you have to consider whether it will add to or reduce your general workload. In other words, will you spend more time retrofitting your concepts to make the abstraction work, or is it ready to do what you need without much hacking (saving you time)?
I think that the abstractions that work are those that are mature: ActiveRecord has been around the block a ton (as has Hibernate), so it provides an abstract way to patch most of the leaks you would normally be worried about, without explicitly rolling your own lower-level solution (i.e., without writing SQL).
Beyond the learning curve, I think that ORMs are an amazing time-saver for most of your database access, and that most apps actually do make quite "normal" use of the DB. While it may not be your case whatsoever, eschewing an ORM for direct DB access is often a case of early, and unnecessary, optimization.
Edit: I hadn't seen this, but the Jeff quote is
Does this abstraction make our code at
least a little easier to write? To
understand? To troubleshoot? Are we
better off with this abstraction than
we were without it?
saying essentially the same thing.
Some of the more modern ORM's are really powerful tools that solve a lot of real world problems. The good ORM's don't try to hide the relational model from you, but actually leverage it to make OO programming more powerful. They really aren't abstractions in the sense that they let you ignore the "lowlevel" details of relational algebra, instead they are toolkits that let you build abstractions on the relational model and make it easier to bring in data into the imperative model, track the changes and push them back to the database. The SQL language really doesn't provide any good way to factor out common predicates into composable, reusable components to achieve businesstule level abstractions.
Sure there is a performance hit, but it's mostly a constant factor thing as you can make the ORM issue what ever SQL you would issue yourself. Like for your name and address example, in SQLAlchemy you'd just do
for name, address in session.query(Client.name, Client.address):
# process data
and you're done. But where the ORM helps you is when you have reusable relations and predicates. For instance, say you have defined a way to join to a client's favorited items, and a predicate to see if it is on sale. Then you can get the list of clients that have some of their favorite items on sale while also fetching the assigned salesperson with the following query:
potential_sales = (session.query(Client).join(Client.favorite_items)
.filter(Item.is_on_sale)
.options(eagerload(Client.assigned_salesperson)))
Atleast for me, the intent of the query is a lot faster to write, clearer and easier to understand when written like this, instead of a dozen lines of SQL.
As to any abstraction, you'll have to pay either in form of performance, or leaking. I agree with you in being against ORM's, since SQL is a clean and elegant language. I've sort of written my own little frameworks which do this things for me, but hey, then I sat there with my own ORM (but with a little more control over it than for example Hibernate). The people behind Hibernate states that it is fast. It should be able to do about 95% of the boring work against your database (simple queries, updates etc..) but gives you freedom to do the last 5% yourself if you want (you could always write your own mappings in special cases).
I think most of the popularity stems from that many programmers are lazy and want established frameworks to do the dirty boring persistence job for them (I can understand that), but the price of an abstraction will always be there. I would consider my options thoroughly before choosing to use an ORM in a serious project.

LINQ2SQL performance vs. custom DAL vs. NHibernate

Given a straightforward user-driven, high traffic web application (no fancy reporting/BI):
If my utmost goal is performance (not ease of maintainability, ease of queryability, etc) I would surmise that in most cases, a roll-yourown DAL would be the best choice.
However, if i were to choose Linq2SQL or NHibernate, roughly what kind of performance hit would we be talking about? 10%? 20%? 200%? Of the two, which would be faster?
Does anyone have any real world numbers that could shed some light on this? (and yes, I know Stackoverflow runs on Linq2SQL..)
If you know your stuff (esp. in SQL and ADO.NET), then yes - most likely, you'll be able to create a highly tweaked, highly optimized custom DAL for your particular interest and be faster overall than a general-purpose ORM like Linq-to-SQL or NHibernate.
As to how much - that's really really hard to say without knowing your concrete table structure, data and usage patterns. I remember Rico Mariani did some Linq-to-SQL vs. raw SQL comparisons, and his end result was that Linq-to-SQL achieve over 90% of the performance of a highly skilled SQL programmer.
See: http://blogs.msdn.com/ricom/archive/2007/07/05/dlinq-linq-to-sql-performance-part-4.aspx
Not too shabby in my book, especially if you factor in the productivity gains you get - but that's the big trade-off always: productivity vs. raw performance.
Here's another blog post on Entity Framework and Linq-to-SQL compared to DataReader and DataTable performance.
I don't have any such numbers for NHibernate, unfortunately.
In two high traffic web apps refactoring a ORM call to use a stored procedure from ado.net only got us about 1-2% change in CPU and time.
Going from an ORM to a custom DAL is an exercise in micro optimization.

Resources