Ec2 instance out of service - amazon-ec2

I have an ec2 instance. and I use this instance for serving Nodejs applications with GitHub action. everything is working as expected. but sometimes like after 1/2 days, the instance is not reachable by anything. it's showing an offline flag on the GitHub action tab. But, in the AWS ec2 console. it's showing running. All stats look normal. I actually have no idea why this behaves like this.
every time this happened, I have to stop and start the instance manually to up and run my server.
EC2 monitor stats
I actually think, there may be resource limitations, therefore the instance is hanging. but after I upgrade my instance type to t2.xlarge it still happens.

Related

My EC2 instances keep becoming inaccessible?

I had a set of t2.micro instances behind a load balancer that keep dying. By dying I mean the REST API running on them would stop responding and I couldn't SSH to the instances. I would have to launch new instances from a saved AMI.
I decided there must be something wrong with the AMI, so I rebuilt the server, got everything running and created a new AMI.
After about a week, the same problem is back. The basic monitoring shows little to no activity on the servers.
The System Log on the instance shows that the server is starting fine and my nodejs REST API is being launched.
Has anyone else experienced this and been able to find a solution?

Amazon Auto Scale is ping ponging since the health checks fail

I have my Auto Scaling Group configured like so:
My Load balancer, when it registers an instance, fails since the health check fails while the instance is still loading.
For what it's worth, my health check is as follows:
I've added the "web-master" instance myself, not part of the auto scale, since what normally happens is that the registering instance fails to add itself to the load balancer, terminates and a new one pops up. This happens countless times until I manually intervene. What am I doing wrong? Is there any way to delay the ELB Health check or at least have it wait till the instance is fully registered?
An instance shouldn't register until its passed its availability tests. Could it be that its your applications spin up time that's the issue, not the instance.
I guess my question is, is your application set up on the port your pinging? I have a lightweight 'healthcheck' app in iis for heathchecks to get around the 'ping pong' effect.

Change running ec2 instance type

Does amazon have the ability to ever offer a feature to allow users to change their ec2 instance types while the server is running? So like a t1.micro to a m1.large and not shut anything down. I know nothing about VMs or what would be involved, so I'm not sure if this is even possible, the level of difficulty (I'd assume difficult enough if they haven't rolled it out), and if there are any plans to do so.
No, instance type can not be changed while the instance is running. To change the instance type you must stop and, change the instance type and then start it.

EC2 dashboard mentions about a running instance, even when the instance is not running

EC2 dashboard mentions about a running instance, even when the instance is not running. I see a EBS volume also in a in-use status. I am confused, is the machine running or not?
I have seen that happen when closing down an linux instance on the machine (with shutdown now from the command line).
If the console says that the instance is running even though you shut it down you should probably shut it down from the console (to avoid being billed).
Sometimes there are problems with the hardware on the server. The instance is showing as running but you cannot connect and you cannot use any services on that instance. The best thing to do in this situation is post a message on EC2's forums and ask them to look at your instance.
They're usually pretty quick to respond though they don't make any grantees. They can force the machine into a stopped state, whether or not they can fix the issue without you loosing your data will depend on what is actually wrong with the instance.
This happens from time to time with my instances as well.

Single instance Amazon EC2

We're running a lightweight web app on a single EC2 server instance, which is fine for our needs, but we're wondering about monitoring and restarting it if it goes down.
We have a separate non-Amazon server we'd like to use to monitor the EC2 and start a fresh instance if necessary and shut down the old one. All our user data is on Elastic Storage, so we're not too worried about losing anything.
I was wondering if anyone has any experience of using EC2 in this way, and in particular of automating the process of starting the new instance? We have no problem creating something from scratch, but it seems like it should be a solved problem, so I was wondering if anyone has any tips, links, scripts, tutorials, etc to share.
Thanks.
You should have a look at puppet and its support for AWS. I would also look at the RightScale AWS library as well as this post about starting a server with the RightScale scripts. You may also find this article on web serving with EC2 useful. I have done something similar to this but without the external monitoring, the node monitored itself and shut down when it was no longer needed then a new one would start up later when there was more work to do.
Couple of points:
You MUST MUST MUST back up your Amazon EBS volume.
They claim "better" reliability, but not 100%, and it's SEVERAL orders of magnitude off of S3's "12 9's" of durability. S3 durability >> EBS durability. That's a fact. EBS supports a "snapshots" feature which backs up your storage efficiently and incrementally to S3. Also, with EBS snapshots, you only pay for the compressed deltas, which is typically far far less than the allocated volume size. In another life, I've sent lost-volume emails to smaller customers like you who "thought" that EBS was "durable" and trusted it with the only copy of a mission-critical database... it's heartbreaking.
Your Q: automating start-up of a new instance
The design path you mention is relatively untraveled; here's why... Lots of companies run redundant "hot-spare" instances where the second instance is booted and running. This allows rapid failover (seconds) in the event of "failure" (could be hardware or software). The issue with a "cold-spare" is that it's harder to keep the machine up to date and ready to pick up where the old box left off. More important, it's tricky to VALIDATE that the spare is capable of successfully recovering your production service. Hardware is more reliable than untested software systems. TEST TEST TEST. If you haven't tested your fail-over, it doesn't work.
The simple automation of starting a new EBS instance is easy, bordering on trivial. It's just a one-line bash script calling the EC2 command-line tools. What's tricky is everything on top of that. Such a solution pretty much implies a fully 100% automated deployment process. And this is all specific to your application. Can your app pull down all the data it needs to run (maybe it's stored in S3?). Can you kill you instance today and boot a new instance with 0.000 manual setup/install steps?
Or, you may be talking about a scenario I'll call "re-instancing an EBS volume":
EC2 box dies (root volume is EBS)
Force detach EBS volume
Boot new EC2 instance with the EBS volume
... That mostly works. The gotchas:
Doesn't protect against EBS failures, either total volume loss or an availability loss
Recovery time is O(minutes) assuming everything works just right
Your services need to be configured to restart automatically. It does no good to bring the box back if Nginx isn't running.
Your DNS routes or other services or whatever need to be ok with the IP-address changing. This can be worked around with ElasticIP.
How are your host SSH keys handled? Same name, new host key can break SSH-based automation when it gets the strong-warning for host-key-changed.
I don't have proof of this (other than seeing it happen once), but I believe that EC2/EBS _already_does_this_ automatically for boot-from-EBS instances
Again, the hard part here is on your plate. Can you stop your production service today and bring it up RELIABLY on a new instance? If so, the EC2 part of the story is really really easy.
As a side point:
All our user data is on Elastic Storage, so we're not too worried about losing anything.
I'd strongly suggest to regularly snapshot your EBS (Elastic Block Storage) to S3 if you are not doing that already.
You can use an autoscale group with a min/max/desired quantity of 1. Place the instance behind an ELB and have the autoscale group be triggered by the ELB healthy node count. This allows you to have built in monitoring by cloudwatch and the ELB health check. Anytime there is an issue the instance be replaced by the autoscale service.
If you have not checked 'Protect against accidental termination' you might want to do so.
Even if you have disabled 'Detailed Monitoring' for your instance you should still see the 'StatusCheckFailed' metric for your instance over which you can configure an alarm (In the CloudWatch dashboard)
Your application (hosted in a different server) should receive the alarm and start the instance using the AWS API (or CLI)
Since you have protected against accidental termination you would never need to spawn a new instance.

Resources