How to deploy code changes across microservices? - spring

We have 10 instances of same microservice identified via eureka service discovery and calls being routed to them through gateway. We want to deploy code changes across these 10 instances but the code changes should be atomic. Meaning at no point of time, 2 instances be running different code.
The simple strategy could be to bring down 9 of the instances--> deploy changes on them --> bring them up --> bring down remaining one instance and after deployment change, bring it up again.
Is this the ideal strategy to be followed on production environment or are there specific patterns to be followed?
The answers on blogs seems to discuss the microservices pattern but none talk about the scenario when some of the instances have newer code version and others yet to be updated.

The ideal strategy is to spin up a few new instances and start balancing requests to them progressively. The load balancer can do IP address pinning so that starting at a particular point in time, an IP address only gets replies from the new instances.

In ideal production world; your atomic requirement is NOT there... Generally we deploy new code on suppose 10% on servers.. see how it is performing in terms of exceptions, latency numbers ..and if all good we keep increasing this percentage..
But I completely understand for some releases ( for example some DB changes though there is even solution for that but that is for another what if ) or for some scenarios we CANNOT have multiple code bases running. First question to be asked for any deployment is "allowed downtime".
Let us assume u need minimum downtime... then solution is that u deploy on another 10 servers; test them out .. and once all is ok , then point your ELB to new servers.. Note that there will be few minutes downtime here..as we have atomic requirement.

Related

NServiceBus: is this an illusion or a best practice?

Given an NServiceBus microservice that uses MSMQ, When I deploy few instances of that service into the same machine, Am I scaling out my application?, Am I improving the performance? or one instance is enough. shall I instead have a more powerful machine to handle messages?
No, running multiple instances on a single machine will not make things run faster, it is only making execution less efficient.
However, it might be that a single instance isn't giving you the expected performance even though your system monitoring indicates there are plenty of resources to spend but not used. In that case you might want tweak the configuration of your NServiceBus endpoint by configuration the amount of allowed parallel message execution.
On the following link you see how you can increase the concurrency:
https://docs.particular.net/nservicebus/operations/tuning
You can further scaleout by actually using multiple machines but if all these endpoints share the same central database your network or database server can easily become the bottleneck. If you consider deploying or scaling out your endpoints across multiple machines make sure that any storage solutions are also scaled out for these not to become your bottleneck.
Zero downtime upgrades/deployments
The only reason to have multiple instance on the same box is for example when deploying a new version, you can temporarily run the current and the new version side-by-side to achieve zero downtime deployments.

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

Continuous deployment with Microsoft Azure

If a worker role or for that matter web roles are continuously serving both long/short running requests. How does continuous delivery work in this case? Obviously pushing a new release in the cloud will abort current active sessions on the servers. What should be the strategy to handle this situation?
Cloud Services have production and staging slots, so you can change it whenever you want. Continuous D or I can be implemented by using Visual Studio Team Services, and i would recommend it - we use that. As you say, it demands to decide when you should switch production and staging slots (for example, we did that when the user load was very low, in our case it was a night, but it can be different in your case). Slots swapping is very fast process and it is (as far as i know) the process of changing settings behind load balancers not physical deployment.
https://azure.microsoft.com/en-us/documentation/articles/cloud-services-continuous-delivery-use-vso/#step6
UPD - i remember testing that, and my experience was that incoming connections were stable (for example, RDP) and outgoing are not. So, i can not guarantee that existing connections will be ended gracefully, but from my experience there were no issues.

About App Engine scalability and the 60 seconds timeout

I have an app-engine+spring+hibernate mobile/web backend configured with F2 instances and D2 cloud sql instance. I've also configure warm-ups by configuring Idle Instances to be minimum 1.
I have two questions:
Is it possible to configure cloud sql instances to scale up when
needed?
My app takes about 20-40 seconds to start (after removing autowiring and doing all the optimization tips described here: https://developers.google.com/appengine/articles/spring_optimization). Still I get slow latency (~20-40 seconds) for some of the requests during load testing. I believe this is because app engine starts new instances and it takes them this much time to start. After the instances are up and running, everything is working fine, until too many users connect and again the delay. Is there a way I can solve this other then configuring more minimum live instances?
For your question regarding Cloud SQL, it currently does not have auto scaling capability.
As Tony said, you can't configure Cloud SQL to automatically scale according to demand. You could, of course, configure it to serve a higher expected demand from the beginning.
On the side, I'd like to suggest different things you could do with your app servers:
Change from F2 to F4 or F4_1G (if you're using a lot of memory) to see if that reduces your startup time.
If you're not doing it yet, you could use AppStats [1] to get a better understanding of which are the bottlenecks of your app. If it's only the startup time, and (1) doesn't help, I'm sorry that configure more idle servers would be the answer you're looking for.
[1] https://developers.google.com/appengine/docs/java/tools/appstats

EC2 for handling demand spikes

I'm writing the backend for a mobile app that does some cpu intensive work. We anticipate the app will not have heavy usage most of the time, but will have occasional spikes of high demand. I was thinking what we should do is reserve a couple of 24/7 servers to handle the steady-state of low demand traffic and then add and remove EC2 instances as needed to handle the spikes. The mobile app will first hit a simple load balancing server that does a simple round-robin user distribution among all the available processing servers. The load balancer will handle bringing new EC2 instances up and turning them back off as needed.
Some questions:
I've never written something like this before, does this sound like a good strategy?
What's the best way to handle bringing new EC2 instances up and back down? I was thinking I could just create X instances ahead of time, set them up as needed (install software, etc), and then stop each instance. The load balancer will then start and stop the instances as needed (eg through boto). I think this should be a lot faster and easier than trying to create new instances and install everything through a script or something. Good idea?
One thing I'm concerned about here is the cost of turning EC2 instances off and back on again. I looked at the AWS Usage Report and had difficulty interpreting it. I could see starting a stopped instance being a potentially costly operation. But it seems like since I'm just starting a stopped instance rather than provisioning a new one from scratch it shouldn't be too bad. Does that sound right?
This is a very reasonable strategy. I used it successfully before.
You may want to look at Elastic Load Balancing (ELB) in combination with Auto Scaling. Conceptually the two should solve this exact problem.
Back when I did this around 2010, ELB had some problems with certain types of HTTP requests that prevented us from using it. I understand those issues are resolved.
Since ELB was not an option, we manually launched instances from EBS snapshots as needed and manually added them to an NGinX load balancer. That certainly could have been automated using the AWS APIs, but our peaks were so predictable (end of month) that we just tasked someone to spin up the new instances and didn't get around to automating the task.
When an instance is stopped, I believe the only cost that you pay is for the EBS storage backing the instance and its data. Unless your instances have a huge amount of data associated, the EBS storage charge should be minimal. Perhaps things have changed since I last used AWS, but I would be surprised if this changed much if at all.
First with regards to costs, whether an instance is started from scratch or from a stopped state has no impact on cost. You are billed for the amount of compute units you use over time, period.
Second, what you are looking to do is called autoscaling. What you do is setup up a launch config that specifies an AMI you are going to use (along with any user-data configs you are using, the ELB and availiabilty zones you are going to use, min and max number of instances, etc. You set up a scaling group using that launch config. Then you set up scaling policies to determine what scaling actions are going to be attached to the group. You then attach cloud watch alarms to each of those policies to trigger the scaling actions.
You don't have servers in reserve that you attach to the ELB or anything like that. Everything is based on creating a single AMI that is used as the template for the servers you need.
You should read up on autoscaling at the link below:
http://aws.amazon.com/autoscaling/

Resources