Does enable "Performance Insights" on Aurora MySQL will make any downtime? - amazon-aurora

I want to enable the new "Performance Insights" on active RDS.
Can I do that without expecting any downtime?
Thanks

Performance Insights does need you to mention a scheduling rule for modifications [1], which does indicate that it may need to restart the database for changes to take effect.
For Scheduling of Modifications, choose one of the following:
Apply during the next scheduled maintenance window – Wait to apply the Performance Insights modification until the next maintenance window.
Apply immediately – Apply the Performance Insights modification as soon as possible.
However, it does appear to be an instance level setting, which means that if you have a multi instance aurora cluster, then you can modify one instance at a time, and failovers should prevent you from having any downtime.
[1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Enabling.html

At the top it of the article it states "Enabling and disabling Performance Insights doesn't cause downtime, a reboot, or a failover."
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Enabling.html

Related

NServiceBus: is this an illusion or a best practice?

Given an NServiceBus microservice that uses MSMQ, When I deploy few instances of that service into the same machine, Am I scaling out my application?, Am I improving the performance? or one instance is enough. shall I instead have a more powerful machine to handle messages?
No, running multiple instances on a single machine will not make things run faster, it is only making execution less efficient.
However, it might be that a single instance isn't giving you the expected performance even though your system monitoring indicates there are plenty of resources to spend but not used. In that case you might want tweak the configuration of your NServiceBus endpoint by configuration the amount of allowed parallel message execution.
On the following link you see how you can increase the concurrency:
https://docs.particular.net/nservicebus/operations/tuning
You can further scaleout by actually using multiple machines but if all these endpoints share the same central database your network or database server can easily become the bottleneck. If you consider deploying or scaling out your endpoints across multiple machines make sure that any storage solutions are also scaled out for these not to become your bottleneck.
Zero downtime upgrades/deployments
The only reason to have multiple instance on the same box is for example when deploying a new version, you can temporarily run the current and the new version side-by-side to achieve zero downtime deployments.

ElastiCache Maintenance Window Availability

We are planning to use ElastiCache (Redis) instead of our own redis cluster. However, the "maintenance window" setting creates some questions,
If I use a multi-az replicated cluster, will elasticache failover to available replicas during maintenance windows or does the entire cluster go down during maintenance?
How long does it generally take?
We can also use MemCached instead of Redis, does it have better availability situation during maintenance windows?
How do others handle ElastiCache manintenance windows? Just go woth the downtime?
Thanks!
There are usually 2 maintenance AWS does.
Continuous managed maintenance updates.
Service updates
While creating cluster you need to specify a 60 min maintenance window. Usually all the maintenance updates (1) will happen during that time.
For every service updates you will recieve notifications when there is a scheduled one. Notification will be in the form of email or a notification on the elasticache page etc... Based on the notification you can reschedule the service updates to a comfortable time. If you fail to reschedule it will by default pick you maintenance window and apply the service updates.
Basically during the maintenance updates, AWS will replace your node with a new node with required updates. If you have primary/replica set up with multi az and auto failover set to true, then during maintenance window of the primary node, you replica will be promoted to master and your read/write requests will be served from there. So ideally you don't see any issue during maintenance maybe a few second downtime to promote the replica as master.
If you either don't set up multiaz with auto failover to true or your elasticache has just one node, you will see downtime during maintenance window.
Refer AWS documentation
How long does it generally take?
Under 60 minutes:
"If a "maintenance" event is scheduled for a given week, it will be initiated and completed at some point during the 60 minute maintenance window you identify."
How often:
Software patching occurs infrequently (typically once every few months) and should seldom require more than a fraction of your maintenance window. If you do not specify a preferred weekly maintenance window when creating your Cache Cluster, a 60 minute default value is assigned.
http://aws.amazon.com/elasticache/faqs/

EC2 for handling demand spikes

I'm writing the backend for a mobile app that does some cpu intensive work. We anticipate the app will not have heavy usage most of the time, but will have occasional spikes of high demand. I was thinking what we should do is reserve a couple of 24/7 servers to handle the steady-state of low demand traffic and then add and remove EC2 instances as needed to handle the spikes. The mobile app will first hit a simple load balancing server that does a simple round-robin user distribution among all the available processing servers. The load balancer will handle bringing new EC2 instances up and turning them back off as needed.
Some questions:
I've never written something like this before, does this sound like a good strategy?
What's the best way to handle bringing new EC2 instances up and back down? I was thinking I could just create X instances ahead of time, set them up as needed (install software, etc), and then stop each instance. The load balancer will then start and stop the instances as needed (eg through boto). I think this should be a lot faster and easier than trying to create new instances and install everything through a script or something. Good idea?
One thing I'm concerned about here is the cost of turning EC2 instances off and back on again. I looked at the AWS Usage Report and had difficulty interpreting it. I could see starting a stopped instance being a potentially costly operation. But it seems like since I'm just starting a stopped instance rather than provisioning a new one from scratch it shouldn't be too bad. Does that sound right?
This is a very reasonable strategy. I used it successfully before.
You may want to look at Elastic Load Balancing (ELB) in combination with Auto Scaling. Conceptually the two should solve this exact problem.
Back when I did this around 2010, ELB had some problems with certain types of HTTP requests that prevented us from using it. I understand those issues are resolved.
Since ELB was not an option, we manually launched instances from EBS snapshots as needed and manually added them to an NGinX load balancer. That certainly could have been automated using the AWS APIs, but our peaks were so predictable (end of month) that we just tasked someone to spin up the new instances and didn't get around to automating the task.
When an instance is stopped, I believe the only cost that you pay is for the EBS storage backing the instance and its data. Unless your instances have a huge amount of data associated, the EBS storage charge should be minimal. Perhaps things have changed since I last used AWS, but I would be surprised if this changed much if at all.
First with regards to costs, whether an instance is started from scratch or from a stopped state has no impact on cost. You are billed for the amount of compute units you use over time, period.
Second, what you are looking to do is called autoscaling. What you do is setup up a launch config that specifies an AMI you are going to use (along with any user-data configs you are using, the ELB and availiabilty zones you are going to use, min and max number of instances, etc. You set up a scaling group using that launch config. Then you set up scaling policies to determine what scaling actions are going to be attached to the group. You then attach cloud watch alarms to each of those policies to trigger the scaling actions.
You don't have servers in reserve that you attach to the ELB or anything like that. Everything is based on creating a single AMI that is used as the template for the servers you need.
You should read up on autoscaling at the link below:
http://aws.amazon.com/autoscaling/

How do I hard stop an Azure role?

Here's my scenario: my Azure web role does a lot of work in OnStart() and produces a huge debug trace that is uploaded to Blob Storage.
Now OnStart() hangs for whatever reason and I look into Blob Storage and see that trace has not been updated for several minutes already. So I decide the role is beyond repair and I want to shut it down immediately so that I can update the role with another package and start it again.
The problem is when I hit "Stop" in the Management Portal it takes up to ten minutes to stop the role - I guess it tries to convince the role to stop gracefully and wait for several minutes.
Can I somehow make the role stop immediately without letting it stop gracefully?
I wonder if deleting the deployment (that's presumably what you're going to do after stopping it?) is faster, but I'm not sure. As far as I know, there's only one kind of "stop," so no, I don't think there's a way to force a faster stop.
Have a look # Windows Azure Platform PowerShell Cmdlets
It should give you at least the same functionality and probably more control over the actions. You could also request the current status as it is not always reflected immediately in the Silverlight portal.

Single instance Amazon EC2

We're running a lightweight web app on a single EC2 server instance, which is fine for our needs, but we're wondering about monitoring and restarting it if it goes down.
We have a separate non-Amazon server we'd like to use to monitor the EC2 and start a fresh instance if necessary and shut down the old one. All our user data is on Elastic Storage, so we're not too worried about losing anything.
I was wondering if anyone has any experience of using EC2 in this way, and in particular of automating the process of starting the new instance? We have no problem creating something from scratch, but it seems like it should be a solved problem, so I was wondering if anyone has any tips, links, scripts, tutorials, etc to share.
Thanks.
You should have a look at puppet and its support for AWS. I would also look at the RightScale AWS library as well as this post about starting a server with the RightScale scripts. You may also find this article on web serving with EC2 useful. I have done something similar to this but without the external monitoring, the node monitored itself and shut down when it was no longer needed then a new one would start up later when there was more work to do.
Couple of points:
You MUST MUST MUST back up your Amazon EBS volume.
They claim "better" reliability, but not 100%, and it's SEVERAL orders of magnitude off of S3's "12 9's" of durability. S3 durability >> EBS durability. That's a fact. EBS supports a "snapshots" feature which backs up your storage efficiently and incrementally to S3. Also, with EBS snapshots, you only pay for the compressed deltas, which is typically far far less than the allocated volume size. In another life, I've sent lost-volume emails to smaller customers like you who "thought" that EBS was "durable" and trusted it with the only copy of a mission-critical database... it's heartbreaking.
Your Q: automating start-up of a new instance
The design path you mention is relatively untraveled; here's why... Lots of companies run redundant "hot-spare" instances where the second instance is booted and running. This allows rapid failover (seconds) in the event of "failure" (could be hardware or software). The issue with a "cold-spare" is that it's harder to keep the machine up to date and ready to pick up where the old box left off. More important, it's tricky to VALIDATE that the spare is capable of successfully recovering your production service. Hardware is more reliable than untested software systems. TEST TEST TEST. If you haven't tested your fail-over, it doesn't work.
The simple automation of starting a new EBS instance is easy, bordering on trivial. It's just a one-line bash script calling the EC2 command-line tools. What's tricky is everything on top of that. Such a solution pretty much implies a fully 100% automated deployment process. And this is all specific to your application. Can your app pull down all the data it needs to run (maybe it's stored in S3?). Can you kill you instance today and boot a new instance with 0.000 manual setup/install steps?
Or, you may be talking about a scenario I'll call "re-instancing an EBS volume":
EC2 box dies (root volume is EBS)
Force detach EBS volume
Boot new EC2 instance with the EBS volume
... That mostly works. The gotchas:
Doesn't protect against EBS failures, either total volume loss or an availability loss
Recovery time is O(minutes) assuming everything works just right
Your services need to be configured to restart automatically. It does no good to bring the box back if Nginx isn't running.
Your DNS routes or other services or whatever need to be ok with the IP-address changing. This can be worked around with ElasticIP.
How are your host SSH keys handled? Same name, new host key can break SSH-based automation when it gets the strong-warning for host-key-changed.
I don't have proof of this (other than seeing it happen once), but I believe that EC2/EBS _already_does_this_ automatically for boot-from-EBS instances
Again, the hard part here is on your plate. Can you stop your production service today and bring it up RELIABLY on a new instance? If so, the EC2 part of the story is really really easy.
As a side point:
All our user data is on Elastic Storage, so we're not too worried about losing anything.
I'd strongly suggest to regularly snapshot your EBS (Elastic Block Storage) to S3 if you are not doing that already.
You can use an autoscale group with a min/max/desired quantity of 1. Place the instance behind an ELB and have the autoscale group be triggered by the ELB healthy node count. This allows you to have built in monitoring by cloudwatch and the ELB health check. Anytime there is an issue the instance be replaced by the autoscale service.
If you have not checked 'Protect against accidental termination' you might want to do so.
Even if you have disabled 'Detailed Monitoring' for your instance you should still see the 'StatusCheckFailed' metric for your instance over which you can configure an alarm (In the CloudWatch dashboard)
Your application (hosted in a different server) should receive the alarm and start the instance using the AWS API (or CLI)
Since you have protected against accidental termination you would never need to spawn a new instance.

Resources