EC2 GPU instance (p3.2xlarge) failing after creating its AMI - amazon-ec2

I have a p3.2xlarge instance with Deep Learning AMI (Ubuntu). I have installed various packages on it. I also attached EFS to this machine. Now, I wanted to implement autoscaling so I created image from this instance. Now the instances launched from the AMI and the original instance seem to stuck at same point. When I go to the original instance's screenshot, this is the last line:
[22.675975] systemd-journald [464] : Received request to flush runtime journal from PID 1

Related

Change Instance type of a cluster registered ec2 instance

I have an Amazon EC2 instance which is registered to a cluster of Amazon ECS.
And I want to change this instance's type from c4.large to c4.8xlarge.
I'm able to change its type from c4.large to c4.8xlarge in AWS console. But after the change, I found
[ERROR] Could not register module="api client" err="ClientException: Container instance type changes are not supported. Container instance XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX was previously registered as c4.large.
being printed in /var/log/ecs/ecs-agent.log.20XX-XX-XX-XX file.
Is it possible to change ec2 instance type and re-register it to a cluster?
I think maybe deregister it first, then register it again should work. But I'm afraid this may cause something irreversible in my AWS working environment. So I haven't tried this method yet.
To solve this connection problem between the agent and cluster, just delete the file /var/lib/ecs/data/ecs_agent_data.json and restart docker and ECS.
After that, a new container instance will be created in your cluster with the new size.
sudo rm /var/lib/ecs/data/ecs_agent_data.json
sudo service docker restart
sudo start ecs
Then you can go to the ECS cluster console and deregister the old container instance
UPDATE:
According to #florins and #MBear commented below, AWS updated the data file on ECS instances.
sudo rm /var/lib/ecs/data/agent.db
sudo service docker restart
sudo start ecs
As of March 2021 / AMI image ami-0db98e57137013b2d, /var/lib/ecs/data/ecs_agent_data.json mentioned in the last useful answer does not exist. For me, the commands to execute on the changed instance were:
sudo rm /var/lib/ecs/data/agent.db
sudo service docker restart
After that, it was possible to deploy containers to the instance, without fresh registration (AWS automatically registered a second ECS container instance of the new type). I did have a leftover container instance with the resources of the old instance type to remove.
You can't do this. Per their docs:
The type of EC2 instance that you choose for your container instances determines the resources available in your cluster. Amazon EC2 provides different instance types, each with different CPU, memory, storage, and networking capacity that you can use to run your tasks. For more information, see Amazon EC2 Instances.
This means that when you launch a container on an instance, the agent gathers a bunch of metadata about the instance to run it. If you change it, all of that metadata (or a lot) has changed in a bad way. CPU units, memory, etc. The agent is aware of this and will report it as an error.
You should spin up a new instance of the new type and register it to the cluster and let the task run on it. If it's a service, just terminate the old instance and let it run it against the new one.
I can't think of any real reason why terminating your old instance would cause something irreversible unless it is misconfigured or fragile via user specific settings, by default this would not cause anything destructive.
As alternative approach if the EC2 instance does not store any valuable a new instance using the old instance as template could be started. This takes all existing values and can be achieved just with a few clicks in minutes.
Select the EC2 instance and then "Actions -> Images and templates -> Start more like this". Just change the instance type.
When the instance is running got the the ECS cluster to the tab "ECS instances" and activate the new created instance.
Shutdown the old instance
Update your task maybe taking more cpu and memory and update the service to take the new task revision

Having problems making AWS ElasticBeanstalk work with Sun JDK

In my application I need the tomcat to be powered by SunJDK. But the default AWS EBS AMI comes with OpenJDK. So I wanted to change this to Sun JDK. But this simplistic task is turning out to be not-so-simple. Here is what I did -
On the EC2 instance that is powering my EBS Environment I installed Sun JDK by downloading the rom manually and then running rpm -i <jdk-rpm-file.rpm> .
Then I updated the java alternatives as listed here.
Next I restarted the app server to make sure that the feature that requires Sun JDK is working. It works.
Next I create an image by right clicking on the EC2 instance and selecting "Create Image (EBS AMI)"
I wait for the AMI to be created. Then I get the AMI ID.
Set the "Custom AMI ID" in configuration of the test environment to be the newly created AMI.
Apply change. This triggers update of environment.
Now comes the problem. As soon as it updates the environment it creates a new EC2 instance to connect to this environment.
Then after "adding" the instance it starts throwing this warning messages - "Failed to retrieve status of instance 'i-eb800c88' 2 consecutive time(s). Elastic Beanstalk will attempt to retrieve status up to 10 consecutive times before terminating the instance."
This continues for 10 tries and then it kills the instance and adds another instance and this continues for a long time.
I am not sure where am I going wrong. Any pointers appreciated.
Recently I got a response from Saad working at AWS Team & it solved my problem. Here is his answer -
You will need to launch the AMI outside of Elastic Beanstalk (directly from the EC2 console), log into it and do your customizations then burn the AMI. Otherwise, the Host Manager might get corrupted and your instance will fail to come up.
The following documentation highlights the steps needed to create an AMI compatible with Elastic Beanstalk: http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/index.html?using-features.customami.html.

Amazon Web Service: Different between Images and Instances

What is the different between starting an AWS Image and Instances?
Example:
I do notice when I am running AWS image using boto, I can only stop the Image while running AWS instance using boto, I can only terminate.
Think of an EC2 instance as a single running server with CPU, memory, hard disk, networking, etc. Any changes you make to that instance affect only that instance.
Think of an AMI (Amazon Machine Image) as an exact copy of the root file system that gets copied to the hard disk when you start a new instance. The AMI is a hard disk sitting on a shelf. You make an exact copy of the hard disk on the shelf, install the new hard disk in a server, and turn the server on. You can do this for as many servers as you'd like to start without affecting the master copy.
The AMI defines the initial state of each instance. Each instance changes as it runs, but you can never change the original AMI once it has been created (other than to delete it).
There are more details that refine this conceptual model, but that's the basics.
Specific to the wording in your question:
Sometimes we say we're "starting an AMI" sometimes we say we're "starting an instance". We mean the same thing. We're really starting an instance using an AMI as the template.
We never say we're "stopping/terminating an image" or "stopping an ami" as, once started, it's really the instance that's running.
You can have one or more instances running that are derived from an image (AMI). Here is a good little tutorial, that's a bit old mind you, talking about how you can convert an Instance to an AMI ... which you can then redeploy one or many times:
http://webkist.wordpress.com/2010/03/16/creating-an-amazon-ec2-ebs-ami-from-a-running-instance/
What is an AMI: Amazon Machine Image
http://en.wikipedia.org/wiki/Amazon_Machine_Image
Technically, you can't start an AMI. You can start an instance that is derived from an AMI.

Why is an AMI tied to a region on ec2?

I understand that when I launch an instance on ec2, that the instance has to be located on a particular data center, and that after launching you can't change that. I also understand that an AMI is created from an instance.
But what I dont understand, is when I launch an instance from an AMI, why can't I specify what region I want it to run on? Seems like it shouldnt matter, once the AMI is created you should be able to launch it in any region. What does the AMI contain that ties it to a region and why?
Kernels. The kernel IDs change across regions (don't ask me why).
Meaning an AMI which specifies a kernel ID to be booted with can only be booted in the region this kernel ID exists.
AMI is region specific because AMI basically contains a software configuration (for example, an operating system, an application server, and applications). From an AMI, you launch an instance, which is a copy of the AMI running as a virtual server in the cloud. And if we consider AMI a global service, it means all AMIs are stored in one place and have access to all regions. For this, it will take more time to pull and then create instances. So if it is available in a region(region-specific), then we can quickly launch instances fast without time delay.

Stop an Amazon EC2 instance with "instance store"

I have an EC2 instance with "instance store" device as a root device (I did not know the difference between it and EBS before launching it). I would like to stop it but I can not do it with the command ec2-stop-instances, the output is:
Client.UnsupportedOperation: The instance 'i-XXXXXXXX' does not have an 'ebs' root device type and cannot be stopped.
Does anybody know how to stop it with Windows Console (I am not the owner of the Amazon account and I won't be able to contact with him for weeks)
Thanks in advance.
EC2 instance with "instance store" cannot be stopped. They can only be terminated.
If you would like the ability to stop an instance, I suggest recreating the instance with an EBS root device. These types of instances support the ec2-stop-instances command. You can consider this command as a "pause", since it can be resumed at any time, and you will not be charged for the time the instance is stopped.
Amazon announced the ability to boot instances from an EBS root volume only in December 2009, so you will find that older documentation and tutorials ignore the extended command-set that came with the EBS root volumes.
Further reading: Amazon EC2 Instances Now Can Boot from Amazon EBS
I am not the owner of the Amazon
account and I won't be able to contact
with him for weeks.
If you really want to "pause" this instance for a few weeks, you can create a machine image (AMI) of your instance and terminate it.
You will then be able to launch a new instance with your private AMI when you want to "resume" this instance. It will not be the same instance, as in it will have a different instance ID and a different IP, but you will be cloning the setup of your instance.
The methods to build an AMI differ if your instance is running Windows or Linux, but you should be able to find adequate information on the web about both scenarios.

Resources