When is LINQ (to objects) Overused? - linq

My career started as a hard-core functional-paradigm developer (LISP), and now I'm a hard-core .net/C# developer. Of course I'm enamored with LINQ. However, I also believe in (1) using the right tool for the job and (2) preserving the KISS principle: of the 60+ engineers I work with, perhaps only 20% have hours of LINQ / functional paradigm experience, and 5% have 6 to 12 months of such experience. In short, I feel compelled to stay away from LINQ unless I'm hampered in achieving a goal without it (wherein replacing 3 lines of O-O code with one line of LINQ is not a "goal").
But now one of the engineers, having 12 months LINQ / functional-paradigm experience, is using LINQ to objects, or at least lambda expressions anyway, in every conceivable location in production code. My various appeals to the KISS principle have not yielded any results. Therefore...
What published studies can I next appeal to? What "coding standard" guideline have others concocted with some success? Are there published LINQ performance issues I could point out? In short, I'm trying to achieve my first goal - KISS - by indirect persuasion.
Of course this problem could be extended to countless other areas (such as overuse of extension methods). Perhaps there is an "uber" guide, highly regarded (e.g. published studies, etc), that takes a broader swing at this. Anything?
LATE EDIT: Wow! I got schooled! I agree I'm coming at this entirely wrong-headed. But as a clarification, please take a look below at sample code I'm actually seeing. Originally it compiled and worked, but its purpose is now irrelevant. Just go with the "feel" of it. Now that I'm revisiting this sample a half year later, I'm getting a very different picture of what is actually bothering me. But I'd like to have better eyes than mine make the comments.
//This looks like it was meant to become an extension method...
public class ExtensionOfThreadPool
{
public static bool QueueUserWorkItem(Action callback)
{
return ThreadPool.QueueUserWorkItem((o) => callback());
}
}
public class LoadBalancer
{
//other methods and state variables have been stripped...
void ThreadWorker()
{
// The following callbacks give us an easy way to control whether
// we add additional headers around outbound WCF calls.
Action<Action> WorkRunner = null;
// This callback adds headers to each WCF call it scopes
Action<Action> WorkRunnerAddHeaders = (Action action) =>
{
// Add the header to all outbound requests.
HttpRequestMessageProperty httpRequestMessage = new HttpRequestMessageProperty();
httpRequestMessage.Headers.Add("user-agent", "Endpoint Service");
// Open an operation scope - any WCF calls in this scope will add the
// headers above.
using (OperationContextScope scope = new OperationContextScope(_edsProxy.InnerChannel))
{
// Seed the agent id header
OperationContext.Current.OutgoingMessageProperties[HttpRequestMessageProperty.Name] = httpRequestMessage;
// Activate
action();
}
};
// This callback does not add any headers to each WCF call
Action<Action> WorkRunnerNoHeaders = (Action action) =>
{
action();
};
// Assign the work runner we want based on the userWCFHeaders
// flag.
WorkRunner = _userWCFHeaders ? WorkRunnerAddHeaders : WorkRunnerNoHeaders;
// This outter try/catch exists simply to dispose of the client connection
try
{
Action Exercise = () =>
{
// This worker thread polls a work list
Action Driver = null;
Driver = () =>
{
LoadRunnerModel currentModel = null;
try
{
// random starting value, it matters little
int minSleepPeriod = 10;
int sleepPeriod = minSleepPeriod;
// Loop infinitely or until stop signals
while (!_workerStopSig)
{
// Sleep the minimum period of time to service the next element
Thread.Sleep(sleepPeriod);
// Grab a safe copy of the element list
LoadRunnerModel[] elements = null;
_pointModelsLock.Read(() => elements = _endpoints);
DateTime now = DateTime.Now;
var pointsReadyToSend = elements.Where
(
point => point.InterlockedRead(() => point.Live && (point.GoLive <= now))
).ToArray();
// Get a list of all the points that are not ready to send
var pointsNotReadyToSend = elements.Except(pointsReadyToSend).ToArray();
// Walk each model - we touch each one inside a lock
// since there can be other threads operating on the model
// including timeouts and returning WCF calls.
pointsReadyToSend.ForEach
(
model =>
{
model.Write
(
() =>
{
// Keep a record of the current model in case
// it throws an exception while we're staging it
currentModel = model;
// Lower the live flag (if we crash calling
// BeginXXX the catch code will re-start us)
model.Live = false;
// Get the step for this model
ScenarioStep step = model.Scenario.Steps.Current;
// This helper enables the scenario watchdog if a
// scenario is just starting
Action StartScenario = () =>
{
if (step.IsFirstStep && !model.Scenario.EnableWatchdog)
{
model.ScenarioStarted = now;
model.Scenario.EnableWatchdog = true;
}
};
// make a connection (if needed)
if (step.UseHook && !model.HookAttached)
{
BeginReceiveEventWindow(model, step.HookMode == ScenarioStep.HookType.Polled);
step.RecordHistory("LoadRunner: Staged Harpoon");
StartScenario();
}
// Send/Receive (if needed)
if (step.ReadyToSend)
{
BeginSendLoop(model);
step.RecordHistory("LoadRunner: Staged SendLoop");
StartScenario();
}
}
);
}
, () => _workerStopSig
);
// Sleep until the next point goes active. Figure out
// the shortest sleep period we have - that's how long
// we'll sleep.
if (pointsNotReadyToSend.Count() > 0)
{
var smallest = pointsNotReadyToSend.Min(ping => ping.GoLive);
sleepPeriod = (smallest > now) ? (int)(smallest - now).TotalMilliseconds : minSleepPeriod;
sleepPeriod = sleepPeriod < 0 ? minSleepPeriod : sleepPeriod;
}
else
sleepPeriod = minSleepPeriod;
}
}
catch (Exception eWorker)
{
// Don't recover if we're shutting down anyway
if (_workerStopSig)
return;
Action RebootDriver = () =>
{
// Reset the point SendLoop that barfed
Stagepoint(true, currentModel);
// Re-boot this thread
ExtensionOfThreadPool.QueueUserWorkItem(Driver);
};
// This means SendLoop barfed
if (eWorker is BeginSendLoopException)
{
Interlocked.Increment(ref _beginHookErrors);
currentModel.Write(() => currentModel.HookAttached = false);
RebootDriver();
}
// This means BeginSendAndReceive barfed
else if (eWorker is BeginSendLoopException)
{
Interlocked.Increment(ref _beginSendLoopErrors);
RebootDriver();
}
// The only kind of exceptions we expect are the
// BeginXXX type. If we made it here something else bad
// happened so allow the worker to die completely.
else
throw;
}
};
// Start the driver thread. This thread will poll the point list
// and keep shoveling them out
ExtensionOfThreadPool.QueueUserWorkItem(Driver);
// Wait for the stop signal
_workerStop.WaitOne();
};
// Start
WorkRunner(Exercise);
}
catch(Exception ex){//not shown}
}
}

Well, it sounds to me like you're the one wanting to make the code more complicated - because you believe your colleagues aren't up to the genuinely simple approach. In many, many cases I find LINQ to Objects makes the code simpler - and yes that does include changing just a few lines to one:
int count = 0;
foreach (Foo f in GenerateFoos())
{
count++;
}
becoming
int count = GenerateFoos().Count();
for example.
Where it isn't making the code simpler, it's fine to try to steer him away from LINQ - but the above is an example where you certainly aren't significantly hampered by avoiding LINQ, but the "KISS" code is clearly the LINQ code.
It sounds like your company could benefit from training up its engineers to take advantage of LINQ to Objects, rather than trying to always appeal to the lowest common denominator.

You seem to be equating Linq to objects with greater complexity, because you assume that unnecessary use of it violates "keep it simple, stupid".
All my experience has been the opposite: it makes complex algorithms much simpler to write and read.
On the contrary, I now regard imperative, statement-based, state-mutational programming as the "risky" option to be used only when really necessary.
So I'd suggest that you put effort into getting more of your colleagues to understand the benefit. It's a false economy to try to limit your approaches to those that you (and others) already understand, because in this industry it pays huge dividends to stay in touch with "new" practises (of course, this stuff is hardly new, but as you point out, it's new to many from a Java or C# 1.x background).
As for trying to pin some charge of "performance issues" on it, I don't think you're going to have much luck. The overhead involved in Linq-to-objects itself is minuscule.

Related

can somebody tell me what error i have in this BPM

This code will auto generate a new part number. This is a Post-processing BPM for BO GetNewPart
int iPartnum = 0;
string cPartid = string.Empty;
Erp.Tables.Company Company;
foreach (var ttpart_xRow in ttPart)
{
var ttpartRow = ttpart_xRow;
Company = (from Company_Row in Db.Company
where Company_Row.Company == Session.CompanyID
select Company_Row).FirstOrDefault();
iPartnum = (decimal)Company["AutoGenerate_c"] + 1;
cPartid = System.Convert.ToString(iPartnum);
ttpartRow.PartNum = cPartid;
Services.Lib.UpdateTableBuffer._UpdateTableBuffer(Company,"AutoGenerate_c", iPartnum);
}
Is it just not working or is there an error message?
Services.Lib.UpdateTableBuffer._UpdateTableBuffer(Company,"AutoGenerate_c", iPartnum);
I have personally never used or even seen this Lib item so I can't vouch for it. I would update the object manually inside of a transaction scope because I doubt GetNewPart ever touches that database and therefore probably doesn't create a transaction.
using (System.Transactions.TransactionScope txScope = IceDataContext.CreateDefaultTransactionScope())//start the transaction
{
//Your Logics go here
Db.Validate();
txScope.Complete();//commit the transaction
}
As a side note, I try to keep these sorts of things off of the company record because nearly every process in the system touches it and I don't want a process to lock it up or cause weird race conditions. I generally like to reserve a record that will only get touched for this specific purpose so I have a UDCodeType/UDCode for this sort of thing.

Why is the second LINQ query faster?

In this code:
static bool Spin(int WaitTime)
{
Console.WriteLine("Running task {0} : thread {1}]",
Task.CurrentId, Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(WaitTime);
return true;
}
public void DemoPLINQLong()
{
var SomeBigNumber = 1000000;
var sequence = Enumerable.Range(0, SomeBigNumber);
var sw = new Stopwatch();
sw.Start();
sequence.Where(i => Spin(SomeBigNumber));
sw.Stop();
var synchTime = sw.Elapsed;
sw.Restart();
sequence.Where(i => Spin(SomeBigNumber));
sw.Stop();
var asynchTime = sw.Elapsed;
Console.WriteLine("Synchronous: {0} Asynchronous: {1}",
synchTime.ToString(), asynchTime.ToString());
}
The results are consistent:
Synchronous: 00:00:00.0021800 Asynchronous: 00:00:00.0000076
Why is the second LINQ query hundreds of times faster? Is there some kind of caching going on? How?
DotNet caches and creates performance optimizations the first time anything is executed; this is known as a Just In Time environment (JIT). Upon subsequent calls to the same code, the run time environment can re-use the existing optimizations which is why you'll frequently see the first run of nearly anything being much slower than subsequent runs of the same code.
A couple of side notes about the posted code:
Not sure what the "Synchronous" and "Asynchronous" terms are referring to; both examples are the exact same thing and there is nothing Asynchronous about them.
If you're not aware, none of the LINQ is being evaluated in the example due to the nature of LINQ's deferred execution. You can see this behavior if you change the example from: sequence.Where(i => Spin(SomeBigNumber)) to sequence.Where(i => Spin(SomeBigNumber)).ToList(). Where, ToList() will force the evaluation of the LINQ predicate and the Console.WriteLine will be written to the console in the Spin method.

Why is performance so bad on this EF data import method?

I have a loop, abridged here, that performs an import of employee records, as follows:
var cts2005 = new Cts2005Entities();
IEmployeeRepository repository = new EmployeeRepository();
foreach (var c in cts2005.Candidates)
{
var e = new Employee();
e.RefNum = c.CA_EMP_ID;
e.TitleId = GetTitleId(c.TITLE);
e.Initials = c.CA_INITIALS;
e.Surname = c.CA_SURNAME;
repository.Insert(e);
}
There are actually several more fields, and a total of nine lookups like GetTitleId(c.TITLE)
above. Code for these is all exactly like this:
private List<Title> _titles;
private Guid GetTitleId(string titleName)
{
ITitleRepository repository = new TitleRepository();
if (_titles == null)
{
_titles = repository.ListAll().ToList();
}
var title = _titles.FirstOrDefault(t => String.Compare(t.Name, titleName, StringComparison.OrdinalIgnoreCase) == 0);
if (title == null)
{
title = new Title { Name = titleName };
repository.Insert(title);
_titles.Add(title);
}
return title.Id;
}
All repository.Insert() calls look like this, except the entity types differ:
public void Insert(Employee entity)
{
CurrentDbContext.Employees.Add(entity);
CurrentDbContext.SaveChanges();
}
And all PK's are Guid. I know this could be a small problem, but I didn't expect it to have such a large effect with small volumes like this.
I have done no tuning or optimization on this routine yet, as it was only for my small test db, but yesterday I was forced to do a surprise import of 6000 records. Toward the end, this processed had slowed to about 1 sec per record, which is quite dismal. I wouldn't have expected high speeds without some tuning, but nothing as low as that.
Is there anything obviously, grossly wrong with my method here?
Your GetTitleId is pulling all Titles from the database once down into the application, but it is doing a linear search over all of them for each "Candidate". That is likely to be very expensive. Use a client-side hashtable with a StringComparer.OrdinalIgnoreCase.
Also, why don't you profile your application? Put load on it and hit break in the debugger 10 times. Where does it stop most of the time? This is the hot-spot.
Thanks to the comment from marc_s above, I changed the routine to only call SaveChanges every 500 inserts, and the speed improved with about 500%.

Can I use LINQ to retrieve only "on change" values?

What I'd like to be able to do is construct a LINQ query that retrieved me a few values from some DataRows when one of the fields changes. Here's a contrived example to illustrate:
Observation Temp Time
------------- ---- ------
Cloudy 15.0 3:00PM
Cloudy 16.5 4:00PM
Sunny 19.0 3:30PM
Sunny 19.5 3:15PM
Sunny 18.5 3:30PM
Partly Cloudy 16.5 3:20PM
Partly Cloudy 16.0 3:25PM
Cloudy 16.0 4:00PM
Sunny 17.5 3:45PM
I'd like to retrieve only the entries when the Observation changed from the previous one. So the results would include:
Cloudy 15.0 3:00PM
Sunny 19.0 3:30PM
Partly Cloudy 16.5 3:20PM
Cloudy 16.0 4:00PM
Sunny 17.5 3:45PM
Currently there is code that iterates through the DataRows and does the comparisons and construction of the results but was hoping to use LINQ to accomplish this.
What I'd like to do is something like this:
var weatherStuff = from row in ds.Tables[0].AsEnumerable()
where row.Field<string>("Observation") != weatherStuff.ElementAt(weatherStuff.Count() - 1) )
select row;
But that doesn't work - and doesn't compile since this tries to use the variable 'weatherStuff' before it is declared.
Can what I want to do be done with LINQ? I didn't see another question like it here on SO, but could have missed it.
Here is one more general thought that may be intereting. It's more complicated than what #tvanfosson posted, but in a way, it's more elegant I think :-). The operation you want to do is to group your observations using the first field, but you want to start a new group each time the value changes. Then you want to select the first element of each group.
This sounds almost like LINQ's group by but it is a bit different, so you can't really use standard group by. However, you can write your own version (that's the wonder of LINQ!). You can either write your own extension method (e.g. GroupByMoving) or you can write extension method that changes the type from IEnumerable to some your interface and then define GroupBy for this interface. The resulting query will look like this:
var weatherStuff =
from row in ds.Tables[0].AsEnumerable().AsMoving()
group row by row.Field<string>("Observation") into g
select g.First();
The only thing that remains is to define AsMoving and implement GroupBy. This is a bit of work, but it is quite generally useful thing and it can be used to solve other problems too, so it may be worth doing it :-). The summary of my post is that the great thing about LINQ is that you can customize how the operators behave to get quite elegant code.
I haven't tested it, but the implementation should look like this:
// Interface & simple implementation so that we can change GroupBy
interface IMoving<T> : IEnumerable<T> { }
class WrappedMoving<T> : IMoving<T> {
public IEnumerable<T> Wrapped { get; set; }
public IEnumerator<T> GetEnumerator() {
return Wrapped.GetEnumerator();
}
public IEnumerator<T> GetEnumerator() {
return ((IEnumerable)Wrapped).GetEnumerator();
}
}
// Important bits:
static class MovingExtensions {
public static IMoving<T> AsMoving<T>(this IEnumerable<T> e) {
return new WrappedMoving<T> { Wrapped = e };
}
// This is (an ugly & imperative) implementation of the
// group by as described earlier (you can probably implement it
// more nicely using other LINQ methods)
public static IEnumerable<IEnumerable<T>> GroupBy<T, K>(this IEnumerable<T> source,
Func<T, K> keySelector) {
List<T> elementsSoFar = new List<T>();
IEnumerator<T> en = source.GetEnumerator();
if (en.MoveNext()) {
K lastKey = keySelector(en.Current);
do {
K newKey = keySelector(en.Current);
if (newKey != lastKey) {
yield return elementsSoFar;
elementsSoFar = new List<T>();
}
elementsSoFar.Add(en.Current);
} while (en.MoveNext());
yield return elementsSoFar;
}
}
You could use the IEnumerable extension that takes an index.
var all = ds.Tables[0].AsEnumerable();
var weatherStuff = all.Where( (w,i) => i == 0 || w.Field<string>("Observation") != all.ElementAt(i-1).Field<string>("Observation") );
This is one of those instances where the iterative solution is actually better than the set-based solution in terms of both readability and performance. All you really want Linq to do is filter and pre-sort the list if necessary to prepare it for the loop.
It is possible to write a query in SQL Server (or various other databases) using windowing functions (ROW_NUMBER), if that's where your data is coming from, but very difficult to do in pure Linq without making a much bigger mess.
If you're just trying to clean the code up, an extension method might help:
public static IEnumerable<T> Changed(this IEnumerable<T> items,
Func<T, T, bool> equalityFunc)
{
if (equalityFunc == null)
{
throw new ArgumentNullException("equalityFunc");
}
T last = default(T);
bool first = true;
foreach (T current in items)
{
if (first || !equalityFunc(current, last))
{
yield return current;
}
last = current;
first = false;
}
}
Then you can call this with:
var changed = rows.Changed((r1, r2) =>
r1.Field<string>("Observation") == r2.Field<string>("Observation"));
I think what you are trying to accomplish is not possible using the "syntax suggar". However it could be possible using the extension method Select that pass the index of the item you are evaluating. So you could use the index to compare the current item with the previous one (index -1).
You could useMorelinq's GroupAdjacent() extension method
GroupAdjacent: Groups the adjacent elements of a sequence according to
a specified key selector function...This method has 4 overloads.
You would use it like this with the result selector overload to lose the IGrouping key:-
var weatherStuff = ds.Tables[0].AsEnumerable().GroupAdjacent(w => w.Field<string>("Observation"), (_, val) => val.Select(v => v));
This is a very popular extension to default Linq methods, with more than 1M downloads on Nuget (compared to MS's own Ix.net with ~40k downloads at time of writing)

Linq Efficiency question - foreach vs aggregates

Which is more efficient?
//Option 1
foreach (var q in baseQuery)
{
m_TotalCashDeposit += q.deposit.Cash
m_TotalCheckDeposit += q.deposit.Check
m_TotalCashWithdrawal += q.withdraw.Cash
m_TotalCheckWithdrawal += q.withdraw.Check
}
//Option 2
m_TotalCashDeposit = baseQuery.Sum(q => q.deposit.Cash);
m_TotalCheckDeposit = baseQuery.Sum(q => q.deposit.Check);
m_TotalCashWithdrawal = baseQuery.Sum(q => q.withdraw.Cash);
m_TotalCheckWithdrawal = baseQuery.Sum(q => q.withdraw.Check);
I guess what I'm asking is, calling Sum will basically enumerate over the list right? So if I call Sum four times, isn't that enumerating over the list four times? Wouldn't it be more efficient to just do a foreach instead so I only have to enumerate the list once?
It might, and it might not, it depends.
The only sure way to know is to actually measure it.
To do that, use BenchmarkDotNet, here's an example which you can run in LINQPad or a console application:
void Main()
{
BenchmarkSwitcher.FromAssembly(GetType().Assembly).RunAll();
}
public class Benchmarks
{
[Benchmark]
public void Option1()
{
// foreach (var q in baseQuery)
// {
// m_TotalCashDeposit += q.deposit.Cash;
// m_TotalCheckDeposit += q.deposit.Check;
// m_TotalCashWithdrawal += q.withdraw.Cash;
// m_TotalCheckWithdrawal += q.withdraw.Check;
// }
}
[Benchmark]
public void Option2()
{
// m_TotalCashDeposit = baseQuery.Sum(q => q.deposit.Cash);
// m_TotalCheckDeposit = baseQuery.Sum(q => q.deposit.Check);
// m_TotalCashWithdrawal = baseQuery.Sum(q => q.withdraw.Cash);
// m_TotalCheckWithdrawal = baseQuery.Sum(q => q.withdraw.Check);
}
}
BenchmarkDotNet is a powerful library for measuring performance, and is much more accurate than simply using Stopwatch, as it will use statistically correct approaches and methods, and also take such things as JITting and GC into account.
Now that I'm older and wiser I no longer belive using Stopwatch is a good way to measure performance. I won't remove the old answer, as google and similar links may lead people here looking for how to use Stopwatch to measure performance, but I hope I have added a better approach above.
Original answer below
Simple code to measure it:
Stopwatch sw = new Stopwatch();
sw.Start();
// your code here
sw.Stop();
Debug.WriteLine("Time taken: " + sw.ElapsedMilliseconds + " ms");
sw.Reset(); // in case you have more code below that reuses sw
You should run the code multiple times to avoid having JITting having too large an effect on your timings.
I went ahead and profiled this and found that you are correct.
Each Sum() effectively creates its own loop. In my simulation, I had it sum SQL dataset with 20319 records, each with 3 summable fields and found that creating your own loop had a 2X advantage.
I had hoped that LINQ would optimize this away and push the whole burden on the SQL server, but unless I move the sum request into the initial LINQ statement, it executes each request one at a time.

Resources