Stateful application in Azure

Stateful application in Azure - caching

The issue I have is that I'm using a third party dll for something (very expensive operation), it's not serializable, and it takes a minute to spin up each time. It's needed on each call of a WCF service and I can't keep it in memory (recyling), and I can't keep it in a cache (unserializable).
I was wondering what alternatives (if any) there are? I was originally thinking about using a Worker Role, but then I read that they are recycled too. Then I considered a Windows service, but I'm hoping there is something better suited.
I'd like to think I'm not the only one with this issue, and that someone else has already solved this issue! :)

Why are you unable to use Worker Roles or Web Roles to keep the data generated by yoru process in memory? Neither of the two roles "flushes" it's memory on a frequent basis. True, that it is not guaranteed that reboots do not happen, but those reboots are very rare and checking to see IF your statefull data is empty and then repopulating it when it is, shouldnt be a big deal and the logic would work on any server the same way, whether it is a Cloud Service or a dedicated VM.
Edit: Web roles or worker roles do not restart on any known cycle. However, by default IIS does recycle on a schedule. This timer can be changed or disabled via a startup script.
Furthermore, no such recycling happens in worker roles. So, if you're running a worker role, the thing will stay in memory as long as you dont recycle the server yourself or a rare windows update happens
HTH

Related

Is there a way to update cached in-memory value on all running instance of a serverless function? (AWS,Google,Azure or OpenWhisk)

Suppose I am running a serverless function with a global state variable which is cached in memory. Assuming that the value is cached on multiple running instances, how an update to the global state would be broadcasted to every serverless instance with the updated value?
Is this possible in any of the serverless framework?

It depends on the serverless framework you're using, which makes it hard to give a useful answer on Stack Overflow. You'll have to research each of them. And you'll have to review this over time because their underlying implementations can change.
In general, you will be able to achieve your goal as long as you can open up a bidirectional connection from each function instance so that your system outside the function instances can send them updates when it needs to. This is because you can't just send a request and have it reach every backing instance. The serverless frameworks are specifically designed to not work that way. They load balance your requests to the various backing instances. And it's not guaranteed to be round robin, so there's no way for you to be confident you're sending enough duplicate requests for each of the backing instances to have been hit at least once.
However, there is something also built into most serverless frameworks that may stop you, even if you can open up long lives connections from each of them that allow them to be reliably messaged at least once each. To help keep resources available for functions that need them, inactive functions are often "paused" in some way. Again, each framework will have its own way of doing this.
For example, OpenWhisk has a configurable "grace period" where it allows CPU to be allocated only for a small period of time after the last request for a container. OpenWhisk calls this pausing and unpausing containers. When a container is paused, no CPU is allocated to it, so background processing (like if it's Node.js and you've put something onto the event loop with setInterval) will not run and messages sent to it from a connection it opened will not be responded to.
This will prevent your updates from reliably going out unless you have constant activity that keeps every OpenWhisk container not only warm, but unpaused. And, it goes against the interests of the folks maintaining the OpenWhisk cluster you're deploying to. They will want to pause your container as soon as they can so that the CPU it consumed can be allocated to containers not yet paused instead. They will try to tune their cluster so that containers remain unpaused for a duration as short as possible after a request/event is handled. So, this will be hard for you to control unless you're working with an OpenWhisk deployment you control, in which case you just need to tune it according to your needs.
Network restrictions that interfere with your ability to open these connections may also prevent you from using this architecture.
You should take these factors into consideration if you plan to use a serverless framework and consider changing your architecture if you require global state that would be mutated this way in your system.
Specifically, you should consider switching to a stateless design where instead of caching occurring in each function instance, it occurs in a shared service designed for fast caching, like Redis or Memcached. Then each function can check that shared caching service for the data before retrieving it from its source. Many cloud providers who provide serverless compute options also provide managed databases like these. So you can often deploy it all to the same place.
Also, you could switch, if not to a stateless design, a pull model for caching instead of a push model. Instead of having updates pushed out to each function instance to refresh their cached data, each function would pull fresh data from its source when they detect that the data stored in their memory has expired.

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?

It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps

With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.

To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

Azure co-located cache starting too long

I am working on Azure solution (Azure SDK 2.1) with one web role (2 instances) and one worker role (2 instances). Both are using co-located (in-role) caching. The problem is that cache service on the worker role instances starting way too long - for several minutes every call to cache returns only DataCacheException-s saying that cache is temporary unavailable etc.
From your experience, is this normal? I think that cache service should be part of the "provisioned" environment, and should be already ready when Run method is called.
Is there anything I can do to handle this? Maybe some "event" to know when cache is ready? A way to say azure fabric to run my worker code only when cache is ready, etc. ?

We have run into the same issue, and most people have suggested a 2-3 minute sleep in your OnRun event as a work around.
I am pretty sure there is an item on the Microsoft Azure backlog for this, and they don't really have a work around other than waiting to use caching until it is ready.
I know this is not the most elegant solution, but in my talks with different people at Microsoft this has been their suggestion for the time being.

MVC3 memory management

What is the best way to check memory usage in an ASP.NET MVC3 application?
I have been told by my hosting provider to recyle the IIS application pool every so often to improve the speed of the site. Is this what is 'recommended practice'? Surely I shouldn't need to restart my application every so often? I'd much rather find out if it is an issue with memory usage in my application and correct it. So any tips & best practices you use would be quite helpful too.
The application is based on ASP.NET MVC3, C# and EF Code First. Any guidance, links appreciated.
EDIT:
I found this page after I posted, which is quite useful. But I'd still like to hear any other views.
ASP.NET MVC and EF Code First Memory Usage
Thank you

I have a site that never recycles (until the machine is rebooted weekly)
Your application generally should keep performing fine. If it doesn't, there is some leak.
This can occur because
Cache never expires
Cache never expires
Session storage keeps growing and never times out
ObjectContexts are never disposed and kept in the session, etc
Objects that should be disposed aren't
Objects that are created via a dependency injection container aren't setup to release after each request, and thus potentially have internal collections that keep growing.
There are more causes - but these are a few main ones.
So the question really is 'there is no best practice - it depends on your app'
If you are worried about current sessions during a restart, keep in mind a restart can be quick and current requests are allowed to finish (sometimes) and forms authentication tokens will survive the restart, however sessions will not unless you configure an out of process state server.
If your memory usage keeps growing, then setup a restart schedule, otherwise do once a week or never - or setup once memory goes to XYZ then reset. ASP.NET will restart automatically once a certain threshold is reached as well based on what the hoster has setup on memoryLimit:
http://msdn.microsoft.com/en-us/library/7w2sway1.aspx

By default IIS recycles the application pool automatically at an interval (I think is 29 hours or so) but that is surely set by the host, no matter how little or how much memory you're the process is using. THe recycling trigger can be a time interval or when the process hits a certain memory usage limit. I'm sure any shared host has both of them set.
About memory usage, you can use the GC.GetTotalMemory method which will give you an approximate usage. Even when using Perfmon the readings aren't very accurate but it gives you an idea.
//global.asax.cs
void Application_EndRequest(object o,EventArgs a)
{
var ctype=Context.Response.Headers["Content-Type"];
if (ctype == null || !ctype.Contains("text/html")) return;
Context.Response.Write(string.format("<p>Memory usage: {0}</p>",GC.GetTotalMemory(false)));
}
Be aware that you'll see the usage increasing increasing until the GC kicks in and the usage will drop to a more 'realistic' value.
If you have the money I recommend a specialized tool such as the Memory profiler
Other things you can do to at least be ready if the application has memory or performance problems:
Proper layering of the application, means you can refactor the more inefficient parts without affecting the others.
The Repository pattern will be very helpful, because you can start using EF , find out that EF uses to much memory (like in the link you've found), but then you could switch the repository implementation to use PetaPoco or Dapper.net.
In general an OR\M is more of a heavy library, if the application doesn't need ORM features but just a quick way to work with a db, use from the beginning a mico-Orm like those mentioned above.
Always dispose objects implementing IDisposable.
When dealing with large db records, use pagination. It's good for both server resources usage and user experience
Apply the YAGNI (You Aint Gonna Need It) principle as much as possible, this somehow implies a bit of TDD :)

How do I hard stop an Azure role?

Here's my scenario: my Azure web role does a lot of work in OnStart() and produces a huge debug trace that is uploaded to Blob Storage.
Now OnStart() hangs for whatever reason and I look into Blob Storage and see that trace has not been updated for several minutes already. So I decide the role is beyond repair and I want to shut it down immediately so that I can update the role with another package and start it again.
The problem is when I hit "Stop" in the Management Portal it takes up to ten minutes to stop the role - I guess it tries to convince the role to stop gracefully and wait for several minutes.
Can I somehow make the role stop immediately without letting it stop gracefully?

I wonder if deleting the deployment (that's presumably what you're going to do after stopping it?) is faster, but I'm not sure. As far as I know, there's only one kind of "stop," so no, I don't think there's a way to force a faster stop.

Have a look # Windows Azure Platform PowerShell Cmdlets
It should give you at least the same functionality and probably more control over the actions. You could also request the current status as it is not always reflected immediately in the Silverlight portal.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio