I am working on a HDFS high availability project.
I have configured Hadoop on one Amazon EC2 instance. It is small instance (AMI: Ubuntu server)
I want to form a cluster of EC2 instances. So, i am thinking of replicating the same machine. Does anybody have a clue about how to duplicate this instance on another instance of EC2. If yes, please share.
Thanks!
If your instance is EBS backed, you can make a snapshot and then run as many instance as you want from it.
Related
Im looking for some advice, this may seem like a silly question but I am having some issues with understanding how AWS EBS autoscaling works and its best practices.
I have a laravel application that is deployed to AWS EBS through bitbucket pipelines. This all works and deploys successfully.
My issue is when the autoscaling triggers it then brings up a new EC2 instance and then load balances the traffic. The problem is that the new EC2 instance in the fleet is a blank AWS Linux2 AMI so just shows the nginx welcome page.
I think the issue is that it's using a blank AMI and not getting my application. I am guessing i could create an image from the EC2 image running my application and then scale with that but i would have to do that every time i do a deployment.
Can you configure the auto scaling group to replicate the running EC2 instance?
Any help or advice as to the best way to accomplish autoscaling with my application would be great.
Its depend on the AMI selected in Launch Configuration..
You need to create AMI of your live EC2 instance after you updated your all required softwares, dbs, configurations and verified(tested) for proper work..
then add this AMI to Auto scale Launch Configuration..
you dont need to create AMI for each deployment..
Whenever you makes changes On Ec2 server , or updates your app source code, you need to create new AMI and need to specify that AMI in Autoscale launch configuration.
best practice is to config the auto scale with a user data script. So when the new AMI boots up during the auto scaling it reads the user data (cloud init/upstart). The user data script can pull the code from the git or what ever source control and run the necessary pre-deployment commands.
I created an Elastic Load Balancer in front of two EC2 instances. However, I discovered an issue that requires me to update the code on both EC2 instances.
I can access each instance individually to update code via github, or I could create an AMI to launch a new instance. It's very unfavorable.
How can I synchronize code between the two EC2 instances?
In situations like this either a code pipeline would be helpful OR better yet switch to Elastic Beanstalk.
We have used Elastic Beanstalk for creating the EC2 instances. Is it possible to screate the new EC2 instance with the existing EC2 instance image, when the existing EC2 instance is getting terminated in any case? Can we achieve this by any configuration?
I don't think this is possible.
As soon as you send a terminate request on your EC2 instance, the IP and hardware (disk and other resources) are released.
If you are trying to do this programmatically, I'd suggest you create an AMI before sending a terminate request.
You can create an AMI from the EC2 before you terminate it, and then create a new Elastic Beanstalk environment using this AMI. However, it's not advisable as you'll lose future version upgrades of that AMI as performed by Amazon.
I advise you use the .ebextensions folder mechanism supplied by Elastic Beanstalk in order to alter new instances as they are spawned (see documentation).
I am quite new to Amazon services, and started reading about EMR. I am more or less familiar with OpenStack. I just want some one to tell me in short what plays the role of Compute, Controller and Cinder of storage in Amazon cloud.
For example Cinder is storage for OpenStack and likewise S3 is the storage in Amazon cloud.
What are the the other two - compute and controller in Amazon cloud?
Also, can some 1 please put up in simple words the relation between EMR and EC2 or are they entirely different ?
Even in EMR we use EC2 instances, so why are people comparing hadoop on EC2 vs Map Reduce like in the following link
Hadoop on EC2 vs Elastic Map Reduce
Thanks a ton in advance :)
Openstack is an open source software that can be setup in your own cloud so that you can have your managed services like Amazon.
Amazon is it's own independent service with its own proprietary implementation and they basically sell the service.
So Openstack has several components that has a somehow 1-1 mapping with AWS services.
Controller -> Amazon Console
Cinder -> EBS
Storage -> S3
Compute -> EC2
EMR (Elastic Map Reduce) is just another service from Amazon that allows you to run hadoop jobs. EMR basically runs on top of EC2 so in essence when you create an EMR cluster it's using EC2 as its underlying service.
You can also run Hadoop independently from EMR on EC2 instances, the downside is that you have to manage all the Hadoop installation, configuration yourself (Cloudera manager is pretty helpful for this). The advantage is that it allows you to tweak as much as you want from the Hadoop stack.
Hope this helps.
I have setup the hadoop cluster on Amazon EC2 using cloudera manager. Cloudera manager created two instances and all is working as expected. I am trying to stop the cloudera created instances through AWS console but there is no option to stop. We have only "Terminate" and "Reboot". I don't want to terminate these instances as i want to reuse these instances.
How to stop these instances ?
Since your instances came from an instance-store backed AMI you will only be able to reboot and terminate the instances. Look in the Management Console under "root device" to confirm this is the case.
To get around this, you can create an AMI from your instances then restart your environment using the new AMI which would give you the option to stop your instances.