I have written a little java tool to benchmark NoSQL databases. Because I dont have enough computers I want to run the benchmark tool and some database nodes in the Amazon EC2.
Is that possible?
-> Can I deploy a java app in the EC2 without any further config.?
Thank you
Can I deploy a java app in the EC2 without any further config
Yes. If you were running a typical web app, you might investigate Elastic BeanStalk. But that wouldn't work for benchmarking.
EC2 computers are just computers, except instead of installing the OS manually, you get to select a pre-installed OS to boot from, called an AMI. You could look around for an image with Java pre-installed, but it's fairly easy to boot your favorite Ubuntu/Fedora/Centos/AmazonLinux and do "apt-get install java" or "yum install java".
At first, you'll upload your program to the box and SSH in to test it. But when you get a workflow going, it's better to upload your program to S3, then have the box download it at boot. (S3 is usually faster than your upload speed, and more reliable.)
If you have just a "tiny" bit of config to do at boot, you can use cloud-init. This will run a pre-defined script at boot. (Just put the commands in the EC2 user-data config at boot.) It could be as simple as 3 commands: install java, download my app, run my app.
For more sophisticated operations, you'll want to use Chef, Puppet, or Ansible to orchestrate multiple servers.
But for something simple like your benchmarking idea, you can easily "roll your own" using the AWS API. Use a library (Boto for Python, Fog for ruby. I'm sure there are several for Java) to write a program that does the following:
1) launch an instance with a cloud-init script that installs a NoSQL DB
2) wait for it to get an IP.
3) launch another instance with a cloud-init script that configures your java test program, and passes in the IP from step 2.
4) waits for it all to run, then collects the run info (or maybe the info is stored in S3 so you can collect it later)
5) cleans up by terminating the instances (It helps to tag them so you can clean up easier)
You could do all this manually, but when you find a bug, you'll want to re-run everything, and automation will make that a breeze. Plus, you'll want to repeat your findings on various instance sizes.
Once you get things working, you can switch to spot instances when running your actual benchmarks: They take longer to launch, but can save a ton of money. So spot instances are annoying for development, but perfect for running bulk tests where you don't care about the launch time.
You can think of EC2 as just a set of computers that you can rent time on. You have total control over the EC2 VMs, and can install and run almost any software you want on them, including database servers and your java app.
You'll probably find the practical limitation is the amount of time you want to spend setting them up. You'll need to sign up for an Amazon account, set up your instances, install an OS, install DB servers, install your java app, etc...
See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html to get going.
Related
Our infrastructure is getting pretty complex with many moving pieces so I'm setting up Vagrant with Ansible to spin up development environments.
My question is who (Vagrant or Ansible or another tool) should be responsible for starting various such as
rails s (for starting rails server)
nginx
nodejs (for seperate API)
I think the answer you're looking for is Ansible (or another tool).
Vagrant has capabilities to run scripts and start services. Once you add a configuration management tool, it should do exactly that. That's part of its job: starting and managing services.
You want the same application configuration regardless of the machine you're spinning up (ESXi, Amazon EC2, Vagrant, whatever), and the best way to do that is outside of Vagrant.
My organization's website is a Django app running on front end webservers + a few background processing servers in AWS.
We're currently using Ansible for both :
system configuration (from a bare OS image)
frequent manually-triggered code deployments.
The same Ansible playbook is able to provision either a local Vagrant dev VM, or a production EC2 instance from scratch.
We now want to implement autoscaling in EC2, and that requires some changes towards a "treat servers as cattle, not pets" philosophy.
The first prerequisite was to move from a statically managed Ansible inventory to a dynamic, EC2 API-based one, done.
The next big question is how to deploy in this new world where throwaway instances come up & down in the middle of the night. The options I can think of are :
Bake a new fully-deployed AMI for each deploy, create a new AS Launch config and update the AS group with that. Sounds very, very cumbersome, but also very reliable because of the clean slate approach, and will ensure that any system changes the code requires will be here. Also, no additional steps needed on instance bootup, so up & running more quickly.
Use a base AMI that doesn't change very often, automatically get the latest app code from git upon bootup, start webserver. Once it's up just do manual deploys as needed, like before. But what if the new code depends on a change in the system config (new package, permissions, etc) ? Looks like you have to start taking care of dependencies between code versions and system/AMI versions, whereas the "just do a full ansible run" approach was more integrated and more reliable. Is it more than just a potential headache in practice ?
Use Docker ? I have a strong hunch it can be useful, but I'm not sure yet how it would fit our picture. We're a relatively self-contained Django front-end app with just RabbitMQ + memcache as services, which we're never going to run on the same host anyway. So what benefits are there in building a Docker image using Ansible that contains system packages + latest code, rather than having Ansible just do it directly on an EC2 instance ?
How do you do it ? Any insights / best practices ?
Thanks !
This question is very opinion based. But just to give you my take, I would just go with prebaking the AMIs with Ansible and then use CloudFormation to deploy your stacks with Autoscaling, Monitoring and your pre-baked AMIs. The advantage of this is that if you have most of the application stack pre-baked into the AMI autoscaling UP will happen faster.
Docker is another approach but in my opinion it adds an extra layer in your application that you may not need if you are already using EC2. Docker can be really useful if you say want to containerize in a single server. Maybe you have some extra capacity in a server and Docker will allow you to run that extra application on the same server without interfering with existing ones.
Having said that some people find Docker useful not in the sort of way to optimize the resources in a single server but rather in a sort of way that it allows you to pre-bake your applications in containers. So when you do deploy a new version or new code all you have to do is copy/replicate these docker containers across your servers, then stop the old container versions and start the new container versions.
My two cents.
A hybrid solution may give you the desired result. Store the head docker image in S3, prebake the AMI with a simple fetch and run script on start (or pass it into a stock AMI with user-data). Version control by moving the head image to your latest stable version, you could probably also implement test stacks of new versions by making the fetch script smart enough to identify which docker version to fetch based on instance tags which are configurable at instance launch.
You can also use AWS CodeDeploy with AutoScaling and your build server. We use CodeDeploy plugin for Jenkins.
This setup allows you to:
perform your build in Jenkins
upload to S3 bucket
deploy to all the EC2s one by one which are part of the assigned AWS Auto-Scaling group.
All that with a push of a button!
Here is the AWS tutorial: Deploy an Application to an Auto Scaling Group Using AWS CodeDeploy
My company has thousands of server instances running application code - some instances run databases, others are serving web apps, still others run APIs or Hadoop jobs. All servers run Linux.
In this cloud, developers typically want to do one of two things to an instance:
Upgrade the version of the application running on that instance. Typically this involves a) tagging the code in the relevant subversion repository, b) building an RPM from that tag, and c) installing that RPM on the relevant application server. Note that this operation would touch four instances: the SVN server, the build host (where the build occurs), the YUM host (where the RPM is stored), and the instance running the application.
Today, a rollout of a new application version might be to 500 instances.
Run an arbitrary script on the instance. The script can be written in any language provided the interpreter exists on that instance. E.g. The UI developer wants to run his "check_memory.php" script which does x, y, z on the 10 UI instances and then restarts the webserver if some conditions are met.
What tools should I look at to help build this system? I've seen Celery and Resque and delayed_job, but they seem like they're built for moving through a lot of tasks. This system is under much less load - maybe on a big day a thousand hundred upgrade jobs might run, and a couple hundred executions of arbitrary scripts. Also, they don't support tasks written in any language.
How should the central "job processor" communicate with the instances? SSH, message queues (which one), something else?
Thank you for your help.
NOTE: this cloud is proprietary, so EC2 tools are not an option.
I can think of two approaches:
Set up password-less SSH on the servers, have a file that contains the list of all machines in the cluster, and run your scripts directly using SSH. For example: ssh user#foo.com "ls -la". This is the same approach used by Hadoop's cluster startup and shutdown scripts. If you want to assign tasks dynamically, you can pick nodes at random.
Use something like Torque or Sun Grid Engine to manage your cluster.
The package installation can be wrapped inside a script, so you just need to solve the second problem, and use that solution to solve the first one :)
I am currently experimenting with Amazon EC2 and use standard ec2 console. The web app is ok but I want a better solution. I want to be able to ssh to the instances, monitor them, possibly attach with a debugger etc. Are there any better alternatives to the tool?
You should be able to login to any EC2 instance via SSH using key files and work with it like if it were an ordinary server. To do it you have to create a key pair, download public key to your local machine, and ensure you've selected that key-pair while launching new instance. You are free to install any software you like on the instance, so the way how you would monitor you instance is completely up to you (if you decide not to use AWS console).
Apart from the web console there are also Amazon EC2 API tools (a bunch of ec2 scripts to be run from Linux console) and the Query API. The later is considered to be the most flexible way to manage your cloud infrastructure. There are binding for EC2 in many scripting languages including Python (boto), Perl (Net::Amazon::EC2), Ruby (amazon-ec2 gem), node.js (aws2js).
Otherwise there's no better solution just because EC2 is IaaS service and it is meant to be equally good for almost any task. For your particular needs you'll have to develop or organize your own environment which will suite your unique needs.
Edit:
Since today it is possible to log in to running EC2 Linux instances from AWS web console:
Our third announcement today is about a new feature in the AWS console that makes it even easier for you to use Amazon EC2 Linux instances. Customers have been asking us to enable the ability to log into their instances directly from the AWS console. Starting today, you can log in to your Linux instances from the EC2 console without the need to install additional software clients. Please see the Amazon EC2 Getting Started Guide for details on how to use this new functionality.
When autoscaling my EC2 instances for application, what is the best way to keep every instances in sync?
For example, there are custom settings and application files like below...
Apache httpd.conf
php.ini
PHP source for my application
To get my autoscaling working, all of these must be configured same in each EC2 instances, and I want to know the best practice to sync these elements.
You could use a private AMI which contains scripts that install software or checkout the code from SVN, etc.. The second possibility to use a deployment framework like chef or puppet.
The way this works with Amazon EC2 is that you can pass user-data to each instance -- generally a script of some sort to run commands, e.g. for bootstrapping. As far as I can see CreateLaunchConfiguration allows you to define that as well.
If running this yourself is too much of an obstacle, I'd recommend a service like:
scalarium
rightscale
scalr (also opensource)
They all offer some form of scaling.
HTH