Azure co-located cache starting too long - caching

I am working on Azure solution (Azure SDK 2.1) with one web role (2 instances) and one worker role (2 instances). Both are using co-located (in-role) caching. The problem is that cache service on the worker role instances starting way too long - for several minutes every call to cache returns only DataCacheException-s saying that cache is temporary unavailable etc.
From your experience, is this normal? I think that cache service should be part of the "provisioned" environment, and should be already ready when Run method is called.
Is there anything I can do to handle this? Maybe some "event" to know when cache is ready? A way to say azure fabric to run my worker code only when cache is ready, etc. ?

We have run into the same issue, and most people have suggested a 2-3 minute sleep in your OnRun event as a work around.
I am pretty sure there is an item on the Microsoft Azure backlog for this, and they don't really have a work around other than waiting to use caching until it is ready.
I know this is not the most elegant solution, but in my talks with different people at Microsoft this has been their suggestion for the time being.

Related

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

Stateful application in Azure

The issue I have is that I'm using a third party dll for something (very expensive operation), it's not serializable, and it takes a minute to spin up each time. It's needed on each call of a WCF service and I can't keep it in memory (recyling), and I can't keep it in a cache (unserializable).
I was wondering what alternatives (if any) there are? I was originally thinking about using a Worker Role, but then I read that they are recycled too. Then I considered a Windows service, but I'm hoping there is something better suited.
I'd like to think I'm not the only one with this issue, and that someone else has already solved this issue! :)
Why are you unable to use Worker Roles or Web Roles to keep the data generated by yoru process in memory? Neither of the two roles "flushes" it's memory on a frequent basis. True, that it is not guaranteed that reboots do not happen, but those reboots are very rare and checking to see IF your statefull data is empty and then repopulating it when it is, shouldnt be a big deal and the logic would work on any server the same way, whether it is a Cloud Service or a dedicated VM.
Edit: Web roles or worker roles do not restart on any known cycle. However, by default IIS does recycle on a schedule. This timer can be changed or disabled via a startup script.
Furthermore, no such recycling happens in worker roles. So, if you're running a worker role, the thing will stay in memory as long as you dont recycle the server yourself or a rare windows update happens
HTH

What exactly happens when I change number of Azure role instances?

I observe the following weird behavior. I have an Azure web role which is deployed on love Azure cloud. Now I click "Configure" in the Azure Management Portal and change the number of instances - the portal shows some "activity". Now I open the browser and navigate to the URL assigned to my deployment and start refreshing the page something like once per two seconds. The page reloads fine many times and then fro some time it will stop reloading - the request will be rejected, then after something like half a minute the requests are handled normally.
What is happening? Is the web server temporarily stopped? How do I change number of instances so that HTTP requests to the role are handled at all times?
When you change the configuration file, your current instance might be restarted. This might be the reason you met with, which your website didn't response in about 30 seconds.
Please have a look http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.serviceruntime.roleenvironment.changing.aspx and check if it 's because of the role restarting.
What you are doing is manual. Have you looked at the SDK for autoscaling Azure?
http://channel9.msdn.com/posts/Autoscaling-Windows-Azure-applications
Check out the demo at the 18 minute mark. It doesn't answer your question directly, but its a much more configurable/dynamic way of scaling Azure.
Azure updates your roles one update domain at a time, so in theory you should see no downtime when updating the config (provided you have at least two instances). However, if you refresh the browser every couple of seconds, it's possible that your requests go always to the same instance due to keep-alive.
It would be interesting to know what the behavior is if you disable keep-alives for your webrole. Note that this will have a performance impact, so you'll probably want to re-enable keep-alives after the exercise.

Simulating background task on AppHarbor

I'm using System.Runtime.Caching.MemoryCache to simulate a repeated task on a running .NET MVC application deployed on AppHarbor.
Entries in the cache are added using a CacheItemPolicy which contains an AbsoluteExpiration offset and a RemovedCallback that calls a method and retriggers the adding of the item in the cache (as described here)
MemoryCache is populated first time in Application_Start. It works fine locally, but doesn't seem to work once deployed on AppHarbor (tried also with HttpRuntime.Cache, same result).
My application is running under a CANOE (free) account on AppHarbor that only has one worker. Does this mean that I won't be able to simulate the background task until I upgrade to some paid plan?
Thanks!
Your application has to have visitors every once in a while for this to work. Other than StillAlive, Pingdom is also a good bet for generating requests to your app. You should also take a look at MomentApp. We expect to have background tasks ready shortly.
I don't think upgrading will help, they are working on adding background jobs to AppHarbor but to my knowledge they available yet.
What about using a service like https://stillalive.com/ to periodically hit a page on your site that then spins up a new thread and starts running your background task? Its available as a free add-on.
I was thinking of doing something like this while waiting for the background task functionality to be available.

How do I hard stop an Azure role?

Here's my scenario: my Azure web role does a lot of work in OnStart() and produces a huge debug trace that is uploaded to Blob Storage.
Now OnStart() hangs for whatever reason and I look into Blob Storage and see that trace has not been updated for several minutes already. So I decide the role is beyond repair and I want to shut it down immediately so that I can update the role with another package and start it again.
The problem is when I hit "Stop" in the Management Portal it takes up to ten minutes to stop the role - I guess it tries to convince the role to stop gracefully and wait for several minutes.
Can I somehow make the role stop immediately without letting it stop gracefully?
I wonder if deleting the deployment (that's presumably what you're going to do after stopping it?) is faster, but I'm not sure. As far as I know, there's only one kind of "stop," so no, I don't think there's a way to force a faster stop.
Have a look # Windows Azure Platform PowerShell Cmdlets
It should give you at least the same functionality and probably more control over the actions. You could also request the current status as it is not always reflected immediately in the Silverlight portal.

Resources