Long running task in WebAPI - task-parallel-library

Here's my problem: I need to call multiple 3rd party methods inside an ApiController. The signature for those methods is Task DoSomethingAsync(SomeClass someData, SomeOtherClass moreData). I want those calls to continue running in the background, after the ApiController has sent the data back to the client. When DoSomethingAsync completes I want to do some logging and maybe save some data to the file system. How can I do that? I'd prefer to use the asyny/await syntax.

Great news, there is a new solution in .NET 4.5.2 called the QueueBackgroundWorkItem API. It's really simple to use:
HostingEnvironment.QueueBackgroundWorkItem(ct => DoSomething(a, b, c));
Here's an article that describes it in detail.
https://blogs.msdn.microsoft.com/webdev/2014/06/04/queuebackgroundworkitem-to-reliably-schedule-and-run-background-processes-in-asp-net/
And here's anohter article that mentions a few other approaches not mentioned in this thread.
http://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx

You almost never want to do this. It is almost always a big mistake.
ASP.NET (and most other servers) work on the assumption that it's safe to tear down your service once all requests have completed. So you have no guarantee that your logging will be done, or that your data will be written to disk. Particularly with the disk writes, it's entirely possible that your writes will be corrupted.
That said, if you are absolutely sure that you want to implement this extremely dangerous design, you can use the BackgroundTaskManager from my blog.
Update: I've written a blog series that goes into detail on a proper solution for request-extrinsic code. In summary, what you really want to do is move the request-extrinsic code out of ASP.NET. Introduce a durable queue and an independent processor; the ASP.NET controller action will place a request onto the queue, and the independent processor will read requests and execute them. This "processor" can be an Azure Function/WebJob, Win32 Service, etc.

Stephen described why starting essentially long running fire-and-forget tasks inside an ApiController is a bad idea.
Perhaps you should create a separate service to execute those fire-and-forget tasks. That service could be a different ApiController, a worker behind a queue, anything that can be hosted on its own and have an independent lifetime.
This would make management of the different task lifetimes much easier and separate the concerns of the long-running tasks from the ApiController's core responsibilities.

As pointed out by others, it is not recommended. However, whenever there is a need there is a way, so take a look at IRegisteredObject
See also
http://haacked.com/archive/2011/10/16/the-dangers-of-implementing-recurring-background-tasks-in-asp-net.aspx/

Though the question is several years old, best possible solution now is to use Singal R in this case.
https://github.com/Myrmex/signalr-notify-progress

Related

Filenet BPM Webservice receive step design consederations

We are currently designing a web service based process, in which we will be using the web-service invoke and receive steps to communicate with a Microsoft biz-talk server.
Our main concern is that a task on the receive step can wait for some time (up to one week) until the biz-talk responds to us, which (we think) would incur a performance penalty on the workflow system as it will be polling for response.
My question is, is there any known performance considerations for the receive step, specially for leaving work items for extended periods?
No, I don't think there will be any undue "overhead". Yes, internally the process engine "polls". For just about anything. Including invoking components, or executing timers. But from a system perspective, you're just waiting for a request.
It sounds like a "receive" step is exactly the right solution here.

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

Workflow Waiting Forever

I have a workflow that runs when an entity is created and it creates two other entities and puts them on a queue. It then waits until each entity's status reason is set to done. After which is continues.
Basically two teams will work an order and then it will continue processing after both teams are done.
Most of the time it works. However sometimes it waits forever. I'll re-active and re-resolve the other tasks, but it just never wakes up.
What can I do? The workflows aren't really powerful enough for me to have it poll with a timeout (there are no loops). I'd like to avoid on-change plugins for these other entities to get workflow behavior all scattered about.
Edit:
Restarting the CRM services (not sure which did it, I restarted them all) allowed the workflow to resume. However, I'd still like to know how to make this more reliable.
I had the same problem (and a lot more) with workflows in CRM 2011 and decided not to use them (except for very special purposes).
The main reason is because of their very limited error handling. Another reason is that it is inconvenient to put them under source control. Another reasons are: Worflows cannot run offline and user impersonation is also not supported. For a comparison look here: http://goo.gl/9ht1QJ
Use plugins instead of workflows, then you have full control.
But keep in mind that plugins (unlike workflows) are not designed for long running tasks.
So they have a default max execution time of 120 sec and are not stateful/persisted. But in most cases (and i think also in your case) that is not a problem.
Just change your eventing a little bit:
Implement and register a plugin step for: entity is created and it creates two other entities and puts them on a queue
Implement and register another step: entity's status reason is set to done, query for other entity and check status, if done continue processing
If you really do not want use plugins for you business logic you can consider implementing a plugin which restarts/resumes faulted workflows.
But thats not a very nice solution.

Concurrency and restriction to Serializable

I am currently writing on an application which requires me to compute something which will take some time to complete. Therefore, I am doing this computation in the background. I now implemented a solution which starts new Threads for each such request via an ExecutorService. These threads regularly report their progress back to a (volatile) IModel. Additionally, I am using an AjaxSelfUpdatingTimerBehavior which updates the website by printing the progress which is represented by this IModel to the screen. By doing so, the website stays responsive, the task can be interrupted by a button click and the HTTP request which was requesting the long lasting task does not time out.
However, Wicket does not like non-Serializable references in its WebPage or Panelinstances and I wonder what would be the best way of solving this problem. For now, I wrote a little manager class which uses a cash which is referenced by a static variable which is how I am avoiding the serialization restriction. The WebPage instance which was triggering the task now only holds a reference to a unique ID which was assigned to it by my manager class when invoking the task.
Of course, with this approach I have to clean up after myself and I am also concerned about security, since I did not yet take actions to avoid interferences of tasks started by different users. Also, it just feels wrong to me, since I want to keep this task on the scope of the WebPage instead of letting the task escape into a global environment. I am sure, there is a better way to do this!
Thanks for any thoughts on this matter and for sharing your experience!
Your approach sound perfectly reasonable: Pass the task-handling to a non-web instance (could be a Spring managed singleton) and just keep an identifier in your component/model.

Looking for pattern/approach/suggestions for handling long-running operation tied to web app

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

Resources