Cloudformation extra when compared with Chef - amazon-ec2

What can I do with Cloudformation that I can not do with chef? It looks like chef support spawning node in the ec2 and I can read back the information of the spawned nodes (ip,..) so why would still need cloudformation for?

Very little, but what it does makes it worth using.
The primary advantage of CloudFormation is that Amazon ties created resources together. In other words if your application comprises a db, four webservers, an autoscaling group, a launch configuration, ingress rules, some VPC subnets, an internet gateway or two, and a VPN connection, you can manage them in a single place, as a CloudFormation stack. Want to shoot them? That's easy. Kill the stack and every resource dies with it.
Sure you could technically do this with Chef and the Amazon API. CloudFormation is little more than a way of generating extremely sophisticated sets of AWS API calls + Magic (tm), so you could always roll your own. Netflix sort of did that with their OS tool, Asgard, and RightScale and other services are more or less that. If those softwares don't meet your needs or if you can't afford them, CloudFormation is a nice supplement to AWS deployments. In fact, you don't even need to rely on it. It's quite simple to launch chef-solo from CloudFormation, which allows you to leverage the advantages of both.

Related

dynamic ec2 resourcing in declarative cloud formation/terraform

We are moving our infrastructure to cloud formation since it's much easier to describe the infrastructure in a nice manner. This works fantastically well for things like security groups, routing, VPCs, transit gateways.
However, we have two issues which we are struggling with and I don't think fit the declarative, infrastructure-as-code paradigm which things like terrafrom and cloud formation are.
(1) We have a business requirement where we run a scheduled batch at specific times in the day. These are very computationally intensive. To save costs, we run these on an EC2 which is brought up at that time, then torn down when the batch is finished. However, this seems to require a temporary change to the terraform/CF files, then a change back. Is there a more native way of doing this?
(2) We dynamically store and allow to be edited by clients their firewalling rules on their load balancer (ALB). This information cannot be stored in the terraform/CF files since it can be changed by clients on demand.
Is there a way of properly doing these things in CF/Terraform?
(1) If you have to use EC2, you could create a Lambda that would start your EC2 instances. Then, create a CloudWatch Event that triggers the Lambda at your specified date / time. For more details you can see https://aws.amazon.com/premiumsupport/knowledge-center/start-stop-lambda-cloudwatch/. Once the job is done, have your EC2 shut itself down using the awssdk or awscli.
Alternatively, you could use AWS Lambda to run your batch job. You only get charged when the Lambda runs. Likewise, create a CloudWatch Event rule that schedules the Lambda.
(2) You could store the firewall rules in your own DB and modify the actual ALB SG rules using the awssdk. I don't think it's a good idea to store these things in Terraform/CF. IMHO Terraform/CF are great for declaring infrastructure but won't be a good solution for resources that are dynamically changing, especially by third parties like your clients.

Using Terraform to provision ECS, Elastic, SQS, Lambda

I am fairly new to Terraform and had a few questions -
- I want to provision a ECS cluster with EC2 launch type running containers. I can do this with Terraform.
- I also need a SQS, Lambda talking to the SQS and feeding data to an Elastic cluster who then will interface with another microservice.
My question is, how should I provision this ?
Is it possible to do all the above in a single ECS cluster using Terraform ?
Would it be better to have the ECS cluster separate and have the SQS, Lambda, Elastic cluster be independent components ? And have the final microservice run as a separate EC2 container (non-ECS) ?
What is the better way of approaching this infrastructure ?
It's a pretty open-ended question so I can provide some guidance based on experience, but please use your best judgement in applying my experience to your individual use case.
What I found in general is that trying to put unrelated things in the same Terraform module (I suppose that's what you mean when you say "components") is asking for trouble. There's a bunch of things that belong together in an ECS cluster - IAM roles and policies, autoscaling group, cluster itself... see for example this module. But SQS, Lambda and Elastic don't feel like they're related. I'd put them separately. Same with your non-ECS microservice.
The other thing I'd want to emphasize is that Terraform is not a deployment tool and should not be used to do that. Terraform shines when defining things that change infrequently and independently. The latter is pretty important because Terraform is not transactional and won't roll back anything if things go wrong. If you need more dynamics and a stronger guarantee of integrity in the AWS world, I'd suggest using Terraform to define a CloudFormation stack and running deployments through the latter.

Fargate vs Lambda, when to use which?

I'm pretty new to the whole Serverless landscape, and am trying to wrap my head around when to use Fargate vs Lambda.
I am aware that Fargate is a serverless subset of ECS, and Lambda is serverless as well but driven by events. But I'd like to be able to explain the two paradigms in simple terms to other folks that are familiar with containers but not that much with AWS and serverless.
Currently we have a couple of physical servers in charge of receiving text files, parsing them out, and populating several db tables with the results. Based on my understanding, I think this would be a use case better suited for Lambda because the process that parses out the text files is triggered by a schedule, is not long running, and ramps down when not in use.
However, if we were to port over one of our servers that receive API calls, we would probably want to use Fargate because we would always need at least one instance of the image up and running.
In terms of containers, and in very general terms would it be safe to say that if the container is designed to do:
docker run <some_input>
Then it is a job for Lambda.
But if the container is designed to do something like:
docker run --expose 80
Then it is a job for Fargate.
Is this a good analogy?
That's the start of a good analogy. However Lambda also has limitations in terms of available CPU and RAM, and a maximum run time of 15 minutes per invocation. So anything that needs more resources, or needs to run for longer than 15 minutes, would be a better fit for Fargate.
Also I'm not sure why you say something is a better fit for Fargate because you "always need at least one instance running". Lambda+API Gateway is a great fit for API calls. API Gateway is always ready to receive the API call and it will then invoke a Lambda function to process it (if the response isn't already cached).
It is important to notice that with Lambda you don't need to build, secure, or maintain a container. You just worry about the code. Now as mentioned already, Lambda has a max run time limit and 3GB memory limit (CPU increases proportionally). Also if it is used sporadically it may need to be pre-warmed (called on a schedule) for extra performance.
Fargate manages docker containers, which you need to define, maintain and secure. If you need more control of what is available in the environment where your code runs, you could potentially use a container (or a server), but that again comes with the management. You also have more options on Memory/CPU size and length of time your run can take to run.
Even for an API server as you mentioned you could put API gateway in front and call Lambda.
As Mark has already mentioned, you can Lambda + API Gateway to expose your lambda function as API.
But lambda has significant limitations in terms of function executions. There are limitations on the programming languages supported, memory consumption and execution time (It was increased to 15 mins recently from the earlier 5 mins). This is where AWS Fargate can help by giving the benefits of both container world and Serverless (FaaS) world. Here you worry only about container (its CPU, memory requirements, IAM policies..) and leave the rest to Amazon ECS by choosing Fargate launch type. ECS will choose the right instance type, manage your cluster, it's auto scaling, optimum utilization.
This is the right analogy, but it is not an exhaustive list to be able to explain the two paradigms.
In general, Lambda is more suitable for serverless applications. Its nature is a function-as-a-service (FaaS). It just does the simple tasks and that’s all. Don’t expect too much more.
It should be considered as the first option for serverless module. But it has more limitations and restrictions. Module architecture elaborated from functional and not-functional requirements, surrounded infrastructure and many other factors.
To make a decision minimum you must review the list of restrictions such as:
Portability
Environment control
Trigger type
Response time
Response size
Process time
Memory usage
These are the main factors. But the list hasn’t covered all the factors and restrictions to consider between both these serverless technologies.
To know more about I recommend this article https://greenm.io/aws-lambda-or-aws-fargate/

Heroku-like deployment and environment configuration via EC2

I really like the approach of a 12factor app, which you are kinda forced into, when you deploy an application to Heroku. For this question I'm particularly interested in setting environment variables for configuration, like one would do on Heroku.
As far as I can tell, there's no way to change the ENV for one or multiple instances within the EC2 console (though it's seems to be possible to set 5 ENV vars when using elastic beanstalk). Therefore my next bet on an Ubuntu based system would be to use /etc/environment, /etc/profile, ~/.profile or just the export command to set ENV variables.
Is this the correct approach or am I missing something?
And if so, is there a best practice on how to do it? I guess I could use something like Capistrano or Fabric, get a list of servers from the AWS api, connect to all of them and change the mentioned files/call export. Though 12factor is pretty well known, I couldn't find any blog post describing how to handle the ENV for a non-trivial amount of instances on EC2. And I don't want to implement such a thing, if somebody already did it very well and I just missed something.
Note: I want a solution without using elastic beanstalk and I don't care about git push deployment or any other Heroku-like feature, this is solely related to app configuration.
Any hints appreciated, thanks!
Good question. There are many ways you can approach your deployment/environment setup.
One thing to keep in mind is that with Heroku (or Elastic Beanstalk for that matter) you only push the code. Their service takes care of the scalability factor and replication of your services across their infrastructure (once you push the code).
If you are using fabric (or capistrano) you are using a push model too, but you have to take care of all the scalability/replication/fault tolerance of your application.
Having said that, if you are using EC2, in my opinion it's better if you leverage AMIs, Autoscale and Cloudformation for your deployments. This is the beauty of elasticity and Virtualization in that you can think of resources as ephemeral. You can still use fabric/capistrano to automate the AMI builds (I use Ansible) and configure environment variables, packages, etc. Then you can define a Cloudformation stack (with a JSON file) and in it you can add an autoscaling group with your prebaked AMI.
Another way of deploying your app is to simply use the AWS Opsworks service. It's pretty comprehensive and it has a lot of options but it may not be for everybody since some people may want a bit more flexibility.
If you want to go 'pull' model you can use Puppet, Chef or CFEngine. In this case you have a master policy server somewhere in the cloud (Puppetmaster, Chef Server or Policy Server). When a server gets spun up, an agent (Puppet agent, Chef Client, Cfengine agent) connects to its master to pick up its policy and then executes it. The policy may contains all the packages and environment variables that you need for your application to function. Again, it's a different model. This model scales pretty well but it depends on how many agents the master can handle and how you stagger the connections from the agents to the master. You can load balance multiple masters too if you want to scale to thousands of servers or you can just simply use multiple masters. From experience, if you want something really "fast" Cfengine works pretty good, there's a good blog comparing the speed of Puppet and CFengine here: http://www.blogcompiler.com/2012/09/30/scalability-of-cfengine-and-puppet-2/
You can also go "push" completely with tools like fabric, Ansible, Capistrano. However, you are constrained by how much a single server (or laptop) can handle multiple connections to thousands of servers that its trying to push to. This is also constrained by network bandwidth, but hey you can get creative and stagger your push updates and perhaps use multiple servers to push. Again it works and it's a different model so it depends which direction you want to go.
Hope this helps.
If you dont need beanstalk, you can look at AWS Opsworks (http://aws.amazon.com/opsworks/). Ideal for Web worker kind of deployment scenerios. You can pass any variable from outside the code here (even Chef recipies)
It's might be late but they what we are doing.
We have python script that take env var in Json and send that to as post data to another python script that convert those vars to ymal file.
After that we use Jenkins pipline groovy using multibranch. Jenkins do all the build and then code deploy copies those env vars to ec2 instanced running in autoscaling.
Off course we are doing some manapulation from yaml to simple text file so code deploy can paste it on /etc/envoirments

Managing server instance identity on EC2

I recently brought up a cluster on EC2, and I felt like I had to invent a lot of things. I'm wondering what kinds of tools, patterns, ideas are out there for how to deal with this.
Some context:
I had 3 different kinds of servers, so first I created AMIs for each of them. The first AMI had zookeeper, so step one in deploying the system was to get the zookeeper server running.
My script then made a note of the mapping between EC2's completely arbitrary and unpredictable hostnames, and the zookeeper server.
Then as I brought up new instances of the other 2 kinds of servers, the first thing I would do is ssh to the new server, and add the zookeeper server to its /etc/hosts file. Then as the server software on each instance starts up, it can find zookeeper.
Obviously this is a problem that lots of people have to solve, and it probably works a little bit differently in different clouds.
Are there products that address this concept? I was pretty surprised that EC2 didn't provide some kind of way to tie your own name to its name.
Thanks for any ideas.
How to do some service discovery on Amazon EC2 seems to have some good options.
I think you might want to look at http://puppetlabs.com/mcollective/introduction/ and the suite of tools from http://puppetlabs.com in general.
From the site:
The Marionette Collective AKA MCollective is a framework to build server orchestration or parallel job execution systems.
Primarily we’ll use it as a means of programmatic execution of Systems Administration actions on clusters of servers. In this regard we operate in the same space as tools like Func, Fabric or Capistrano.
I am fairly certain mcollective was built to solve exactly the problem you are trying to address. But, be forewarned, it's not a DNS-based solution, it's a method of addressing arbitrarily large and arbitrarily tagged groups of hosts.

Resources