Is there a hard limit on how long Azure role startup can take? - windows

Suppose I include a rather long-running startup task into my Azure role - running something like up to several minutes. What happens if the startup task runs "too long".
I'm currently testing on Compute Emulator and observe the following.
I have a 450 megabytes .zip file together with Info-Zip unzip. The startup task unzips the archive. Deployment starts and I look into Task Manager. Numerous service processes start, then unzip.exe is run. After about two minutes all those processes stop and then start anew and unzip.exe starts again.
So it looks like a deployment is allowed to run for about two minutes, then is forcefully reset and started again.
Is this the expected behavior? Does it persist on real cloud? Are there any hard limits on how long a role startup can take? How do I address this situation except moving the unpacking into RoleEntryPoint.OnStart()?

I had the same question, so tried an experiment. I ran a Startup Task - taskType="simple" so that it would block the Roles from beginning to execute - and let it run for 50 hours. The Fabric Controller did not complain and the portal did not show any error. It finished its long "do nothing" loop after the 50 hours was up, then this Startup Task exited, and my Web Role started up fine.
So my emperical test says Startup Tasks can take a long time! At least 50 hours.

This should inform the load balancer that your process is still busy:
http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.serviceruntime.roleinstancestatuscheckeventargs.setbusy.aspx

I have run startup tasks that run for a pretty long time (think 20-30 mins) and the role is simply in a 'Busy' state. I don't think there is a hard limit for how long the role will stay in that state as long as the Startup task is still executing and did not exit with a non-zero return code (in fact, this is a gotcha for most first time startup task creators when they pop a prompt). The FC is technically still running just fine, so there would be no reason to 'recover' the role (i.e. heartbeats are still going).
The dev emulator just notices when the role hasn't started and warns you. If you click the 'keep waiting' option, it will continue to run the Startup task to completion. The cloud does not do this of course (warn you).
Never tried a task that ran super long, so there might be a very long limit. I seem to recall 3 hrs was a magic number in some timeout cases like role recycles, but I have never tried...

There are some heartbeats that the Azure Fabric Agent will do against the role. If these are not acknowledged (say a long-running blocking process), this could cause the role to be flagged as unavailable.
You might try putting your startup process into a background thread that runs independently. This should help you keep the role from being recycled while the process is starting up. Just keep in mind you may need to make some adjustments if you get requests before the role fully starts up. There's also a way (that I can't seem to recall ATM) to flag the role and take it out of the load balancer temporarially while your process completes.

Related

Run script hole time on VPS server

Is it possible to create a script that is always running on my VPS server? And what need i to do to run it the hole time? (I haven't yet a VPS server, but if this is possible i wants to buy one!
Yes you can, there are many methods to get your expected result.
Supervisord
Supervisord is a process control system that keeps any process running. It automatically start or restart your process whenever necessary.
When to use it: Use it when you need a process that run continuously, eg.:
A queue worker that reads a database continuously waiting for a job to run.
A node application that acts like a daemon
Cron
Cron allow you running processes regularly, in time intervals. You can for example run a process every 1 minute, or every 30 minutes, or any time interval you need.
When to use it: Use it when your process is not long running, it do a task and end, and you do not need it beign restarted automatically like on Supervisord, eg.:
A task that collects logs everyday and send it on a gzip by email
A backup routine.
Whatever you choose, there are many tutorials on the internet on how configuring both, so I'll not go into this details.

Heroku: Prevent worker process from restarting?

I have a Heroku worker setup to do a long running job which iterates over long periods. However whenever I do an update & deploy of other files in the repo this worker restarts, which is annoying, any way to avoid this?
No. This behaviour is part of Heroku's Automatic Dyno Restarting.
You can't work around this. Instead, you need to build all parts of your app to be able to function properly despite the fact that all dynos will restart at least once every 24 hours or so, whether or not you deploy updates in your repo.
Most significantly, you need to build support for Graceful Shutdown into all your processes (e.g. web process and worker processes).

Running batch applications on Cloudfoundry: using tasks instead of long-running processes

I would like to run a batch application (that is a short lived process that should not be restarted) on Pivotal CloudFoundry.
I am not sure how to do that. My current batch app is restarted repeatedly by Pivotal CF.
It seems there's a new CF primitive called a task - as opposed to a long-running process. Tasks are supposed to be available on CF 1.7 (see https://stackoverflow.com/a/35512113/536299).
I was neither able to find relevant information in the CF documentation nor to figure out which version of the Pivotal CF is currently being run...
Can someone please help?
I just got relevant information regarding short-lived/one-off processes on CF. It currently seems to be very difficult to run short-lived/one-off processes on CF.
This will change when CF v3's tasks becomes generally available.
Here is the information I was given:
Batch jobs are a little tricky on PWS and PCF because at the moment
the platform expects your application to continue running forever.
Even if the app exits successfully, the platform considers it to have
crashed and will restart it. There is support in v3 of the platform
for one-off tasks like batch jobs, so this will get easier in the
future. For now, what you need to do is to make the app run forever.
One option is to add a loop to the main method in the app, the loop
would essentially run the batch job, pause for some set amount of time
and repeat indefinitely.
So bottom-line is wait for CF v3's tasks.
See here for documentation about tasks: http://v3-apidocs.cloudfoundry.org/version/release-candidate/index.html#tasks

Spring #Scheduled After A Server Restart

I'm creating a mechanism in my web server whereby a scheduled task will execute every 15 minutes and notify users if any activity has occurred within that time frame. It would work as follows:
Annotate a with #Scheduled and schedule to run every 15 minutes
When the task runs, scrape the database for any changes within 15 minutes of the current time
A couple problems I can see:
If I have to restart the server and it's down for longer than 15 minutes, I would need to look back longer than 15 minutes so that no activity is missed.
I m running a number of tomcat servers and only one of them needs to execute the task. Otherwise, duplicate emails will be sent to users.
Has anyone dealt with this before? I'm thinking that this should really be a task external to the web servers... that would solve the issue of duplicate emails being sent, but it wouldn't solve the server bounce issue.
Any ideas on how to solve would be greatly appreciated!
I would have done the following steps to perform the scheduling:
On Application startup query for tasks from database (only those which don't have a dirty flag set to false) and schedule it.
On each run of scheduled task put a dirty flag to suggest the task has run
Because I will be retrieving those tasks only which are marked as dirty, the issue of multiple emails should not occur even on server startup.

Monitor server, process, services, Task scheduler status

I am wondering if there is a way to monitor these automatically. Right now, in our production/QA/Dev environments - we have bunch of services running that are critical to the application. We also have automatic ETLs running on windows task scheduler at a set time of the day. Currently, I have to log into each server and see if all the services are running fine or not, or check event logs for any errors, or check task scheduler to see if ETLs ran well etc etc... I have to do all the manually... I am wondering if there is a tool out there that will do the monitoring for me and send emails only in case something needs attention (like ETLs fail to run, or service get stopped for whatever reason or errors in event log etc). Thanks for the help.
Paessler PRTG Network Monitor can do all that. we have very good experience with it.
http://www.paessler.com/prtg/features
Nagios is the best tool for monitoring. It checks for the server status as well the defined services in it and if any service goes down or system goes down, sends the mail to specified mail id.
Refer the : http://nagios.org/
Thanks for the above information. I looked at the above options but they have a price.. what I did is an inexpensive way to address my concerns..
For my windows task scheduler jobs that run every night - I installed this tool/service from codeplex that is working great.
http://motash.codeplex.com/documentation#CommentsAnchor
For Windows services - I am just setting the "Recovery" Tab in each service "property" with actions to do when it fails. (like restart, reboot, or run a program which could be an email that will notify)
I built a simple tool (https://cronitor.io) for monitoring periodic/scheduled tasks. The name is a play on "cron" from the unix world, but it is system/task agnostic. All you have to do is make an http request to a unique tracking URL whenever your job runs. If your job doesn't check-in according to the rules you define then it will send you an email/sms message.
It also allows you to track the duration of your jobs by making calls at the beginning and end of your task. This can be really useful for long running jobs since you can be alerted if they start taking too long to run. For example, I once had a backup task that was scheduled every hour. About six months after I set it up it started taking longer than an hour to run!
There is https://eyewitness.io - which is for monitoring server cron tasks, queues and websites. It makes sure each of your cron jobs run when they are supposed to, and alerts you if they failed to be run.

Resources