This is sort of a follow up on this question I read:
What is the biggest mistake people make when starting to use LINQ?
The top answer is "That it should be used for everything." That made me wonder what exactly that means.
What are some common examples where someone used LINQ when they should not have?
You shouldn't use LINQ when the alternative is simpler or significantly more efficient.
I would suggest avoiding LINQ anytime it made the code less obvious.
However, in general, I think LINQ makes things easier to follow, not more difficult, so I rarely avoid it.
It is possible for LINQ to be significantly slower than the alternatives, especially if you have a lot of intermediate Lists. However you are talking about some pretty large datasets, so large that I haven't encountered them.
However, one thing to keep in mind is that a well-written LINQ query can also be much faster than the alternative because of the way IEnumerable works.
Finally, using LINQ now will allow you to switch to Parallel LINQ when it is released with little or no changes.
It's still okay to use foreach. :)
Related
i have a lot of similiar looking small pieces of code. E.g. parsing config files with jdom, coverting the into regex patterns. It's all stuff that is done in 10 lines. Writing some abstract meta-monster that does all this would be very complicated.
Now I always here people crying about double-code. Is this in my usecase really such a bad thing? Having the code similar makes it easy to understand and maintain. There is no big interrelation of functions.
Am I doing the right thing?
Over-engineering is an antipattern. If you don't need abstraction, don't use it.
Abstraction and patterns are the most useful when your project is large or is supposed to grow. If it that isn't your situation, then Keep It Simple and Stupid.
It's also a matter of taste. Personally, even if it is sometimes discouraged, I prefer using patterns and abstraction even in simple situations if I feel that it might be useful in the future, because I hate rewriting the same lines of code twice. In addition, design patterns also help you to avoid errors because they put order into your code and class relations.
No, having very similar code makes it hard to maintain, if you've got more than, say, three of those pieces of code. When you catch a bug (or get a specs change) that affects all or several of those pieces of code, you have to try and spot the differences. It may even be harder to fix then when they're all exactly the same.
The least you can do is try to lift out some commonalities and make a tiny library of well-named helper functions. Lifting out the tricky bits is more important than how many lines of code you save.
It really depends on what those 10-lines look like. Some cases that don't seem to warrant a proper abstraction, can be solved with a simple loop.
Think the title describes my thoughts pretty well :)
I've seen a lot of people lately that swear to LINQ, and while I also believe it's awesome, I also think you shouldn't be confused about the fact that on most (all?) IEnumerable types, it's performance is not that great. Am I wrong in thinking this? Especially queries where you nest Where()'s on large datasets?
Sorry if this question is a bit vague, I just want to confirm my thoughts in that you should be "cautious" when using LINQ.
[EDIT] Just to clarify - I'm thinking in terms of Linq to Objects here :)
It depends on the provider. For Linq to Objects, it's going to be O(n), but for Linq to SQL or Entities it might ultimately use indices to beat that. For Objects, if you need the functionality of Where, you're probably going to need O(n) anyway. Linq will almost certainly have a bigger constant, largely due to the function calls.
It depends on how you are using it and to what you compare.
I've seen many implementations using foreaches which would have been much faster with linq. Eg. because they forget to break or because they return too many items. The trick is that the lambda expressions are executed when the item is actually used. When you have First at the end, it could end up it just one single call.
So when you chain Wheres, if an item does not pass the first condition, it will also not be tested for the second condition. it's similar to the && operator, which does not evaluate the condition on the right side if the first is not met.
You could say it's always O(N). But N is not the number of items in the source, but the minimal number of items required to create the target set. That's a pretty good optimization IMHO.
Here's a project that promises to introduce indexing for LINQ2Objects. This should deliver better asymptotic behavior: http://i4o.codeplex.com/
At what point does LINQ become too terse and procedural techniques resorted to?
Terseness is in the eye of the beholder. When you're not comfortable with the code anymore, then it's time to refactor it a bit. The refactoring could be swapping to some procedural bits, or breaking your linq queries apart, or whatever it takes to make it understandable again. As long as the intent of the code is obvious, it shouldn't matter how terse it is or what techniques are used to achieve the end goal :-)
Any language construct, not just LINQ, is too terse when the majority of people on your group cannot quickly understand what a line of code is doing.
When you can no longer do what is required to be done (easily).
There is a proliferation of new LINQ providers. It is really quite astonishing and an elegant combination of lambda expressions, anonymous types and generics with some syntax sugar on top to make it easy reading. Everything is LINQed now from SQL to web services like Amazon to streaming sensor data to parallel processing. It seems like someone is creating an IQueryable<T> for everything but these data sources can have radically different performance, latency, availability and reliability characteristics.
It gives me a little pause that LINQ makes those performance details transparent to the developer. Is LINQ a solid general purpose abstraction or a RAD tool or both?
To me, LINQ is just a way to make code more readable, and hence more maintainable. LINQ does nothing more than takes standard methods and integrates them into the language (hence the name - language integrated query).
It's nothing but a syntax element around normal interfaces and methods - there is no "magic" here, and LINQ-to-something really should (IMO) be treated as any other 3rd party API - you need to understand the cost/benefits of using it just like any other technology.
That being said, it's a very nice syntax helper - it does a lot for making code cleaner, simpler, and more maintainable, and I believe that's where it's true strengths lie.
I see this as similar to the model of multiple storage engines in an RDBMS accepting a common(-ish) language of SQL, in it's design ... but with the added benefit of integreation into the application language semantics. Of course it is good!
I have not used it that much, but it looks sensible and clear when performance and layers of abstraction are not in a position to have a negative impact on the development process (and trust that standards and models wont change wildly).
It is just an interface and implementation that may fit your needs, like all interfaces, abstractions, libraries and implementations, does it fit?... it is all the same answers.
I suppose - no.
LINQ is just a convenient syntax, but not a common RAD tool. In the big projects with complex logic I noticed that developers do more errors in LINQ that in the same instructions they could do if they write the same thing in .NET 2.0 manner. The code is produced faster, it is smaller, but it is harder to find bugs. Sometimes it is not obvious from the first look, at what point the queried collection turns from IQueryable into IEnumerable... I would say that LINQ requires more skilled and disciplined developers.
Also SQL-like syntax is OK for a functional programming but it is a sidestep from object oriented thinking. Sometimes when you see 2 very similar LINQ queries, they look like copy-paste code, but not always any refactoring is possible (or it is possible only by sacrificing some performance).
I heard that MS is not going to further develop LINQ to SQL, and will give more priority to Entities. Is the ADO.NET Team Abandoning LINQ to SQL? Isn't this fact a signal for us that LINQ is not a panacea for everybody ?
If you are thinking about to build a connector to "something", you can build it without LINQ and, if you like, provide LINQ as an additional optional wrapper around it, like LINQ to Entities. So your customers will decide, whether to use LINQ or not, depending on their needs, required performance etc.
p.s.
.NET 4.0 will come with dynamics, and I expect that everybody will also start to use them as LINQ... without taking into considerations that code simplicity, quality and performance may suffer.
I was planning to benchmark that but since it's a lot of work, I'd like to check if I didn't miss any obvious answer before.
I have a huge query that gets some more details for each row with a subquery.
Each row is then used in a ListAdapter that is plugged in a ListView, so another loop take each row one by one to make it a ListItem.
What do you think is more efficient :
Keeping the subqueries in the SQL mess, counting on the SQL engine to make optimizations .
Taking out the subqueries in the ListAdapter loop, so we lazy load the details on display : much more readable but I'm afraid too many hit would slow down the process.
Two important things :
I can't rewrite the big SQL chunk to get rid of the subqueries. I know it would be better, but I failed to do so.
As far as I can tell, a list won't contain more than 1000 items, and it's a desktop app so there is no concurrency. Is this even relevant to care about perf in that case ? If not, I'd still be interested in the answser for a hight traffic web site anyway. It's good to know...
SQlite is a surprisingly good little engine, but it's not really about extra clever optimizations, and I wouldn't really consider it for a "high traffic web site". One big plus (for uses within its limitations) is that it can run in-process, so that the overhead of multiple queries is really small compared to one big query; if that's easiest to code, for your specific use case, I would really consider it (and doing it in a "lazy load" way, as you hint, might actually make the first screen of data appear faster!). As you suspect, it's unlikely that this will be a performance bottleneck, in your use case, so going for simpler and thus more reliable coding is an important plus.
If I was doing a high-traffic site, and using a richer, "heavier" engine such as PosgtreSQL, Oracle, SQL Server, or DB2, I would trust the optimizer much more. One thing I've noticed, however, is that I can often (alas, not always) change sub-queries into joins, and that often tends to improve performance (joins make it easier for the optimizer to use good indices, I think -- I have never coded a SQL optimizer myself, but that's my impression from staring at query execution plans from many engines for alternative forms of queries... that, of course, DOES assume you have good indices!-) -- this would have to be confirmed with a benchmark of the specific case in question, of course, but it would be my initial working assumption.
What about using cursors?
I would prefer using a big query and let my SQL engine optimize my query.
Also I can't think of an example where it's better to do a loop outside SQL instead of using a "big" query or using cursors.
But the best way to know what's better is to benchmark it.
Good luck!