Does Azure role need to clean up local resources before terminating? - windows

Suppose my Azure role is notified that it will be terminated soon and technically can clean up local resources after itself (temporary files for example). Should it do so?
I'm not asking about whether someone will see my leftover temporary files - just how my role can be a polite good Azure citizen.
Does it make sense for the role to clean up local resources or should it just leave everything as is?

Like Stuart said, there's no reason to do any local storage cleanup. You either leave it for yourself to use in the future (which is not guaranteed), or you have the local storage cleaned up automatically after your role instance shuts down.
What you do want to do during shutdown is relesae blob leases, close open sessions, shut down database connections, etc. You won't have this opportunity if the Guest OS (or Host OS) crashes, but you always want to handle graceful shutdowns when possible.

I can't think of any good reason why you should clean up things like temporary files during this shutdown.
Instead I just use the notification as a graceful way to shutdown - hopefully avoiding leaving any jobs "half-finished".
For the issue of temporary files in particular, the LocalStorage feature has a "Clean on Role Recycle" property - you should probably set that to true.

Related

Stateful application in Azure

The issue I have is that I'm using a third party dll for something (very expensive operation), it's not serializable, and it takes a minute to spin up each time. It's needed on each call of a WCF service and I can't keep it in memory (recyling), and I can't keep it in a cache (unserializable).
I was wondering what alternatives (if any) there are? I was originally thinking about using a Worker Role, but then I read that they are recycled too. Then I considered a Windows service, but I'm hoping there is something better suited.
I'd like to think I'm not the only one with this issue, and that someone else has already solved this issue! :)
Why are you unable to use Worker Roles or Web Roles to keep the data generated by yoru process in memory? Neither of the two roles "flushes" it's memory on a frequent basis. True, that it is not guaranteed that reboots do not happen, but those reboots are very rare and checking to see IF your statefull data is empty and then repopulating it when it is, shouldnt be a big deal and the logic would work on any server the same way, whether it is a Cloud Service or a dedicated VM.
Edit: Web roles or worker roles do not restart on any known cycle. However, by default IIS does recycle on a schedule. This timer can be changed or disabled via a startup script.
Furthermore, no such recycling happens in worker roles. So, if you're running a worker role, the thing will stay in memory as long as you dont recycle the server yourself or a rare windows update happens
HTH

Stopped service does not release its resources?

I'm trying to deploy a patch to a service I created and replace the service file.
For that reason I need to stop the service so the file will be released.
I'm using sc \\remote stop svcname, then I query the service using sc \\remote query svcname until I see that it's state is STOPPED.
At this point the service file should be unlocked, and to be on the safe side I also delete the service using sc \\remote delete svcname.
Still, it doesn't seem to release the file and any deletion or change attempt fails.
I know one solution might be polling the file repeatedly, but I want to avoid this method.
Any suggestions?
Windows don't ensure the process providing the service terminates when the service is stopped (the process may provide more than one service). It just considers the service stopped when it handles the message sent to it.
So if the service process has a bug and does not properly release resources, they may still be locked. I would probably wait a little and than simply terminate the process.
There is also a tool from Microsoft called handle.exe (this is command-line version, they also have a GUI-one) that can list which processes hold the file open. It should be possible to get the same information programmatically, but I am not sure of the exact calls to make (and you need administrator privileges; you have to give them to the tool too). That way you can check whether the file is open, by which process and wait for it to terminate or force-terminate it if you didn't know which one it is.

How do I hard stop an Azure role?

Here's my scenario: my Azure web role does a lot of work in OnStart() and produces a huge debug trace that is uploaded to Blob Storage.
Now OnStart() hangs for whatever reason and I look into Blob Storage and see that trace has not been updated for several minutes already. So I decide the role is beyond repair and I want to shut it down immediately so that I can update the role with another package and start it again.
The problem is when I hit "Stop" in the Management Portal it takes up to ten minutes to stop the role - I guess it tries to convince the role to stop gracefully and wait for several minutes.
Can I somehow make the role stop immediately without letting it stop gracefully?
I wonder if deleting the deployment (that's presumably what you're going to do after stopping it?) is faster, but I'm not sure. As far as I know, there's only one kind of "stop," so no, I don't think there's a way to force a faster stop.
Have a look # Windows Azure Platform PowerShell Cmdlets
It should give you at least the same functionality and probably more control over the actions. You could also request the current status as it is not always reflected immediately in the Silverlight portal.

Is it possible for RoleEntryPoint.OnStart() to be run twice before the host machine is cleaned up?

I plan to insert some initialization code into OnStart() method of my class derived from RoleEntryPoint. This code will make some permanent changes to the host machine, so in case it is run for the second time on the same machine it will have to detect those changes are already there and react appropriately and this will require some extra code on my part.
Is it possible OnStart() is run for the second time before the host machine is cleared? Do I need this code to be able to run for the second time on the same machine?
Is it possible OnStart() is run for
the second time before the host
machine is cleared?
Not sure how to interpret that.
As far as permanent changes go: Any installed software, registry changes, and other modifications should be repeated with every boot. If you're writing files to local (non-durable storage), you have a good chance of seeing those files next time you boot, but there's no guarantee. If you are storing something in Windows Azure Storage (blobs, tables, queues) or SQL Azure, then your storage changes will persist through a reboot.
Even if you were guaranteed that local changes would persist through a reboot, these changes wouldn't be seen on additional instances if you scaled out to more VMs.
I think the official answer is that the role instance will not run it's Job more than once in each boot cycle.
However, I've seen a few MSDN articles that recommend you make startup tasks idempotent - e.g. http://msdn.microsoft.com/en-us/library/hh127476.aspx - so probably best to add some simple checks to your code that would anticipate multiple executions.

Single instance Amazon EC2

We're running a lightweight web app on a single EC2 server instance, which is fine for our needs, but we're wondering about monitoring and restarting it if it goes down.
We have a separate non-Amazon server we'd like to use to monitor the EC2 and start a fresh instance if necessary and shut down the old one. All our user data is on Elastic Storage, so we're not too worried about losing anything.
I was wondering if anyone has any experience of using EC2 in this way, and in particular of automating the process of starting the new instance? We have no problem creating something from scratch, but it seems like it should be a solved problem, so I was wondering if anyone has any tips, links, scripts, tutorials, etc to share.
Thanks.
You should have a look at puppet and its support for AWS. I would also look at the RightScale AWS library as well as this post about starting a server with the RightScale scripts. You may also find this article on web serving with EC2 useful. I have done something similar to this but without the external monitoring, the node monitored itself and shut down when it was no longer needed then a new one would start up later when there was more work to do.
Couple of points:
You MUST MUST MUST back up your Amazon EBS volume.
They claim "better" reliability, but not 100%, and it's SEVERAL orders of magnitude off of S3's "12 9's" of durability. S3 durability >> EBS durability. That's a fact. EBS supports a "snapshots" feature which backs up your storage efficiently and incrementally to S3. Also, with EBS snapshots, you only pay for the compressed deltas, which is typically far far less than the allocated volume size. In another life, I've sent lost-volume emails to smaller customers like you who "thought" that EBS was "durable" and trusted it with the only copy of a mission-critical database... it's heartbreaking.
Your Q: automating start-up of a new instance
The design path you mention is relatively untraveled; here's why... Lots of companies run redundant "hot-spare" instances where the second instance is booted and running. This allows rapid failover (seconds) in the event of "failure" (could be hardware or software). The issue with a "cold-spare" is that it's harder to keep the machine up to date and ready to pick up where the old box left off. More important, it's tricky to VALIDATE that the spare is capable of successfully recovering your production service. Hardware is more reliable than untested software systems. TEST TEST TEST. If you haven't tested your fail-over, it doesn't work.
The simple automation of starting a new EBS instance is easy, bordering on trivial. It's just a one-line bash script calling the EC2 command-line tools. What's tricky is everything on top of that. Such a solution pretty much implies a fully 100% automated deployment process. And this is all specific to your application. Can your app pull down all the data it needs to run (maybe it's stored in S3?). Can you kill you instance today and boot a new instance with 0.000 manual setup/install steps?
Or, you may be talking about a scenario I'll call "re-instancing an EBS volume":
EC2 box dies (root volume is EBS)
Force detach EBS volume
Boot new EC2 instance with the EBS volume
... That mostly works. The gotchas:
Doesn't protect against EBS failures, either total volume loss or an availability loss
Recovery time is O(minutes) assuming everything works just right
Your services need to be configured to restart automatically. It does no good to bring the box back if Nginx isn't running.
Your DNS routes or other services or whatever need to be ok with the IP-address changing. This can be worked around with ElasticIP.
How are your host SSH keys handled? Same name, new host key can break SSH-based automation when it gets the strong-warning for host-key-changed.
I don't have proof of this (other than seeing it happen once), but I believe that EC2/EBS _already_does_this_ automatically for boot-from-EBS instances
Again, the hard part here is on your plate. Can you stop your production service today and bring it up RELIABLY on a new instance? If so, the EC2 part of the story is really really easy.
As a side point:
All our user data is on Elastic Storage, so we're not too worried about losing anything.
I'd strongly suggest to regularly snapshot your EBS (Elastic Block Storage) to S3 if you are not doing that already.
You can use an autoscale group with a min/max/desired quantity of 1. Place the instance behind an ELB and have the autoscale group be triggered by the ELB healthy node count. This allows you to have built in monitoring by cloudwatch and the ELB health check. Anytime there is an issue the instance be replaced by the autoscale service.
If you have not checked 'Protect against accidental termination' you might want to do so.
Even if you have disabled 'Detailed Monitoring' for your instance you should still see the 'StatusCheckFailed' metric for your instance over which you can configure an alarm (In the CloudWatch dashboard)
Your application (hosted in a different server) should receive the alarm and start the instance using the AWS API (or CLI)
Since you have protected against accidental termination you would never need to spawn a new instance.

Resources