Clarifying... So Background Jobs don't Tie Up Application Resources (in Rails)? - ruby

I'm trying to get a better grasp of the inner workings of background jobs and how they improve performance.
I understand that the goal is to have the application return a response to the user as fast as it can, so you don't want to, say, parse a huge feed that would take 10 seconds because it would prevent the application from being able to process any other requests.
So it's recommended to put any operations that take more than say 500ms to execute, into a queued background job.
What I don't understand is, doesn't that just delay the same problem? I know the user who invoked that background job will get an immediate response, but what if another user comes right when that background job starts (and it takes 10 seconds to finish), wont that user have to wait?
Or is the main issue that, requests are the only thing that can happen one-at-a-time, while on the other hand a request can start while one+ background jobs are in the middle of running?
Is that correct?

The idea of a background process is that it takes care of all the long running processes.
Basically, it is an external application that is running outside of the webserver with one or several processes that handles the requests.
So, it doesn't matter if there is another user requesting a page since it the job is not occupying the webserver, the user will not have to wait for anything to finish.
If that user also do something that is being put in the background queue, then it will just stack up there until the first one is finished (or in the case where there are multiple processes handling it, as soon as there is one available).
Hope this explanation makes it a bit more clearer :)

Related

How to keep webserver responsive while executing many asynchronous background tasks

I am working on a web application that provides its users to optionally execute long-running processes 'in background'. An example would be some long-running report generation, or deleting thousands of objects simultaneously.
I've implemented this using an ExecutorService defined as FixedThreadPool using a ThreadFactory. The ThreadFactory is built like this:
ThreadFactoryBuilder()
.setNameFormat(clientId + "-BackgroundTask-%d")
.setDaemon(true)
.setPriority(Thread.MIN_PRIORITY)
.build()
I execute the task like this:
Future<TaskStatus> future = clientExecutors.get(clientId).submit(
backgroundTask::execute);
taskFutures.put(backgroundTask.getTaskId(), future);
How can I enforce my webserver to always priorize handling new incoming requests (as fast as possible) over executing background tasks?
In other words: It should never ever happen, that a user has to wait long time while browsing the site, just because there are a lot of background-tasks executing. As you can see from above, I tried to do this by setting .setPriority(Thread.MIN_PRIORITY). However that does not seem to be sufficient.
Furthermore, as for now, I've set some arbitrary value for the FixedThreadPool size (10) and use it globally for the entire background-handling of the application (and all its customers).
Instead I would like to define a threadpool for each customer, to make sure each customer has the same privilege to run a certain amount of tasks in the background. Say, each customer has a FixedThreadPool of size 5, and on the server I'll have a max. of 50 different customers. That would add up to 250 running background tasks at the same time.
The most important requirement here is: it does not matter, how long these background-tasks need to execute (say 2 minutes, or 20 minutes). What is important, is that each customer has the ability to send 5 tasks to be executed in background, and each of those are worked on equally.
I've tested running 30 cpu-intensive background tasks and it turns out that while these are running and cpu is near 100%, new incoming requests take a very long time to be handled.
So obviously, I am doing it wrong.
Update 12.09.2017
I've read about microservices and while it sounds great I see a great challenge in splitting the necessary parts from our monolithic application. Mostly because nearly every operation might turn into a long running process given a big enough data selection.
Furthermore, wouldn't I run into the same problem with my microservice, i.e. the server running the microservice would suffer the same performance degradation. Well the only good thing would, that the rest of the web app would not suffer from it anymore.
I've read some posts about introducing Thread.sleep(1) or Thread.sleep in general into CPU-heavy operations to reduce the amount of CPU used in these operations. I've also read about someone who introduced this as an aspect so that he can even change the amount of time waited dynamically in order to have some control about how much cpu would be used.
However, my gut tells me that ain't right either. What do you think about introducing Thread.sleep to lower the amount of CPU used for a task? Is this common practice? If not, what would be the right approach?
I would highly consider changing your system architecture to offload these long-running requests to a separate instance instead of running them in-process with the general request-service application. In general I think it is an anti-pattern to handle both batch / online (or long / short running) processing in the same application instance.
Ideally you'd build a standalone microservice to handle these requests, but you could also simply just deploy X instances of your existing application, and configure your load balancer to route requests to the long running invocation paths (e.g. POST /myapp/longrunningjob) only to the instances dedicated to running these long-running processes.

slow down for in loop

I have a function that loops through a list of items by sending them to a server and grabbing the response. The problem I'm having is the loop is going faster than the server can handle. I need to figure out a way to slow the loop down without freezing the application. Is there a way to delay the loop from moving to the next item for a brief moment? In other languages, I'd use something like sleep(interval).
Don't slow the process down. Add the network calls to an operation queue with a limited number of concurrent operations. You may need to rewrite your network code as an NSOperation subclass but that's fairly straightforward. You can see some examples in this tutorial.
There is a built-in limit to the number of simultaneous network connections that can be made anyway, but it sounds like your server's limit is lower than that, or that you're saturating the network connections and your later calls are timing out before they've been able to start.
Instead of a sleep interval it sounds like you want a completion block that calls the same code again until the list is empty. So once it finishes the request, it goes onto the next one.
Also I don't think you should be trying to sleep since it will hold the main thread which results in a poor user experience.

Repeated tasks - spawn new processes or run continuously?

We have about 10 different Python scripts that download data from the web, read data from a database and write data back to that database. They do so repeatedly every 10 seconds (or 10 seconds after the last task has completed).
The question is, what is the best approach at running these tasks? I can think of a few ways:
a while True that runs the task then sleeps for the interval. It could be guarded by a watchdog like supervisord, making sure it is always up.
having the script execute the task just once, and invoking the script externally once every 10 seconds by another process.
having the script execute the task lets say for 1 hour (every 10 seconds for an hour), and having a watchdog make sure that task runs again once the hour is over.
I would like to avoid long running processes that actually do something because I don't want to deal with memory problems etc over long periods of time.
Additional Information
The scripts are different because they each retrieve data from a different source, and query, calculate and insert different data into the database.
The tasks are performed every 10 seconds since the data being retrieve is in real-time, and we need to not only keep updating it very frequently, but also keep all the historical data in the database.
There are a lot of resources being used by the scripts - MySQL connections, HTTP connections, Redis connections, etc. We have encountered issues with using the long-running approach before, specifically with MySQL connections (things like MySQL server has gone away, even though all connections had been closed). Hence the inclination toward having the scripts run in shorter periods of time.
What are some common approaches at this?
Unless your scripts somehow leak memory (quite unlikely), they should all be the same. So, for sheer simplicity (your time programming/debugging is much more expensive than a few miliseconds of the machine's time, even each 10 seconds!) I'd go for the single script that checks each 10 seconds.
OTOH, checking each 10 seconds sounds like busywork. Can't you set up so that whatever you are monitoring tells you when there are changes? Or batch the records up so you can retrieve, say, a day's worth at at time?
If you are running on linux, cron has granularity of a minute. We have processes we run constantly. Rather than watch them, the script will open a semaphore that gets released when the program finishes normally or not. This way if it runs long and it gets called again by cron, the copy will exit when it can't get the lock. This way you can call it a often as you need to without it stepping on a possibly still running copy.

how to avoid a request timing out when uploading and resizing large amount of images in Coldfusion?

I'm running Coldfusion8 and have a cfc, that loops through a set of database records.
Each record contains two fields image path and image file. I'm constructing a path for every image, upload it to a temp folder, resize and then store it to S3.
Depending on the number of records, this may take quite some time and I have not been able to successfully finish the upload cycle with larger sets of images (eventually times out).
I'm already settings my timeout threshold to 5000, but it still does not seem enough.
I can pick up where I left, because I'm keeping a media log to check against, before uploading to S3. This way I can finish the task, but I need to trigger this function 5x to upload 400 items.
Question:
Is there way to avoid a timeout without setting (in S3 case) httptimeout to some 50000000? And would it make sense to run this in a CFTHREAD or will this be a problem if the user leaves the import page while the system is still uploading?
Thanks for some insights.
You can use a CFthread to perform the task, but make sure you LOCK THE SCOPE! otherwise you could end up running this memory intensive proccess several times over and kill the server, you only want this proccess running once at a time if its so intensive.
You have other options though, if this is not something that your application users will need to run and its a one-off proccess your doing, you could set a scheduled task with an exceedingly long timeout to run overnight, when the server is not very high use, This allows you to set the timeout independently to the application so the rest of the application is unaffected by global timeout changes.
Another option is, if this is something users will be doing semi-regularly then a thread which pushes a notification via email, log or other means (Ajax or Websockets) letting the user know they're task is complete. This has the upside that timeouts can be changed, calculated on the amount of data to be proccessed dynamically at thread generation. However, if your not careful you can overload your server with many threads proccessing large datasets (plus log file read-write locks will be harder to manage).
I would encourage you though, to take this away and see what solution works for you and post your final solution so others can see what the outcome is.
Hope this helps.

WP7 Max HTTPWebRequests

This is kind of a 2 part question
1) Is there a max number of HttpWebRequests that can be run at the same time in WP7?
I'm going to create a ScheduledTaskAgent to run a PeriodicTask. There will be 2 different REST service calls the first one will get a list of IDs for records that need to be downloaded, the second service will be used to download those records one at a time. I don't know how many records there will be my guestimage would be +-50.
2.) Would making all the individual record requests at once be a bad idea? (assuming that its possible) or should I wait for a request to finish before starting another?
Having just spent a week and a half working at getting a BackgroundAgent to stay within it's memory limits, I would suggest doing them one at a time.
You lose about half your memory to system libraries and the like, your first web request will take another nearly 20%, but it seems to reuse that memory on subsequent requests.
If you need to store the results into a local database, it is going to take a good chunk more. I have found a CompiledQuery uses less memory, which means holding a single instance of your context.
Between each call I would suggest doing a GC.Collect(), I even add a short Thread.Sleep() just to be sure the process has some time to tidying things up.
Another thing I do is track how much memory I am using and attempt to exit gracefully when I get to around 97 or 98%.
You can not use the debugger to test memory limits as the debug memory is much higher and the limits are not enforced. However, for comparative testing between versions of your code, the debugger does produce very similar result on subsequent runs over the same code.
You can track your memory usage with Microsoft.Phone.Info.DeviceStatus.ApplicationCurrentMemoryUsage and Microsoft.Phone.Info.DeviceStatus.ApplicationMemoryUsageLimit
I write a status log into IsolatedStorage so I can see the result of runs on the phone and use ScheduledActionService.LaunchForTest() to kick the off. I then use ShellToast notifications to let me know when the task runs and also when it completes, that way I can launch my app to read the status log without interrupting it.
Tyler,
My 2 cents here.
I don't believe there is any restriction on how mant HTTPWebequests you can spin up. These however have to be async, off course, and may be served from the browser stack. Most modern browsers including IE9 handle over 5 concurrently to the same domain; but you are not guaranteed a request handle immediately. However, it should not matter if you are willing to wait on a separate thread, dump your content on to the request pipe & wait for response on yet another thread. This post (here) has a nice walkthrough of why we need to do this.
Nothing wrong with this approach either, IMO. You're just going to have to wait until all the requests have their respective pipelines & then wait for the responses.
Thanks!
1) Your memory limit in a PeriodicTask or ResourceIntensiveTask is 5 MB. So you definitely should control your requests really careful. I dont think there is a limit in the code.
2)You have only 5 MB. So when you start all your requests at the same time it will terminate immediately.
3) I think you should better use a ResourceIntensiveTask because a PeriodicTask should only run 15 seconds.
Good guide for Multitasking features in Mango: http://blogs.infosupport.com/blogs/alexb/archive/2011/05/26/multi-tasking-in-windows-phone-7-1.aspx
I seem to remember (but can't find the reference right now) that the maximum number of requests that the OS can make at once is 7. You should avoid making this many at once though as it will stop other/system apps from being able to make requests.

Resources