Cookbooks vs manually setting up statsD/metric collection servers? - amazon-ec2

I am in the process of setting up a dedicated ec2 instance as a statsD server. I was wondering if there's a best practice around this. Allow me to elaborate. When dealing with cloud infrastructure, I found Terraform to be extremely useful. All the infrastructure you need is expressed and code and any changes to this code base of terraform modules can be tracked effectively. It also makes sense to have it in the same repository as your source code. So whenever CD kicks in, we can make sure our infrastructure is updated as and when needed.
I have a similar question around a statsd server. I came across the Chef Configuration Management tool but given the size of our operation at this stage - it feels like an overkill. I am curious to know what do people do for such servers. Do they prefer manually managing these? Or is there a way to express it as code - like chef. Or probably something else I don't know about.

At the risk of sounding opinionated, I would say that maintaining configuration using configuration management tool such as Chef is a good idea regardless of the size of the infrastructure.
However for your situation, you should evaluate below points:
Although your current requirement is one statsd server now, do you foresee requirement for additional machines?
Configuration management tools like Chef are components of your infrastructure, which need to setup as well. Is it feasible for your current requirement or something in near future?
In most cases you will be able to reuse community work, such as statsd cookbook from Chef supermarket. If putting effort into automating it yourself is your concern.

Related

Managing configuration files for multiple instances of the same application (same environment)

I have multiple instances of the same engine running as windows services on the same environment and system that just have slightly different connection strings as they point to different queues. Other than a couple of lines in the conifg (XML) the rest of the application is exactly the same (config and binaries). When config changes are made this is done to all instances which is time consuming so I am doing some research into the best method of managing the config files in a scalable and version controlled way. Currently I use a batchfile to copy the default engine directory and config over and then find and replace the individual strings. I'd prefer to have a template config that can be updated that pulls in set variables for the connection strings depending on the instance and environment. I understand that this may be possible using chef, puppet or ansible but to my understanding these are more for system configuration as opposed to individual application files? Does anyone know if this is possible with gitlab or AWS? Before committing to the learning curve I'm trying to discern if one of the aforementioned config management tools would be overkill for this scenario or a realistic solution?
I understand that this may be possible using chef, puppet or ansible but to my understanding these are more for system configuration as opposed to individual application files?
Managing individual files, including details of their contents, is a common facet of configuration management. Chef, Puppet, and Ansible can all do this with relative ease.
Does anyone know if this is possible with gitlab or AWS?
No doubt, someone does. And I anticipate, but cannot confirm, that the answer is "yes" for both.
Before committing to the learning curve I'm trying to discern if one of the aforementioned config management tools would be overkill for this scenario or a realistic solution?
A configuration management system would almost certainly be overkill if the particular task you describe is the only thing you are considering them for.
Currently I use a batchfile to copy the default engine directory and
config over and then find and replace the individual strings. I'd
prefer to have a template config that can be updated that pulls in set
variables for the connection strings depending on the instance and
environment.
In the first place, if it ain't broke, don't fix it. On the other hand, if it is broke, and switching to a template-based approach is a reasonable method to resolve the issue, then you can certainly implement that with a for-purpose local script without bringing in all the apparatus of a configuration management system.
In the event that you do decide that the current mechanism needs to be replaced, do, for goodness sake, ditch batchfile. It's one of the worst scripting languages ever inflicted on humanity. PowerShell would be a natural replacement on Windows, but you might also consider Python, or pretty much any programming language you know.

How do you release Microservices?

The question is tied more to CI/CD practices and infrastructure. In the release we follow, we club a set of microservices docker image tags as a single release, and do CI/CD pipeline and promote that version.yaml to staging and production - say a sort of Mono-release pattern. The problem with this is that at one point we need to serialize and other changes have to wait, till a mono-release is tested and tagged as ready for the next stage.A little more description regarding this here.
An alternate would be the micro-release strategy, where each microservice release in parallel through production through the CI/CD pipeline. But then would this mean that there would be as many pipelines as there are microservices? An alternate could have a single pipeline, but parallel test cases and a polling CD - sort of like GitOps way which takes the latest production tagged Docker images.
There seems precious little information regarding the way MS is released. Most talk about interface level or API level versioning and releasing, which is not really what I am after.
Assuming your organization is developing services in microservices architecture and is deploying in a kubernetes cluster, you must use some CD tool (continuous delivery tool) to release new microservices services, or even update a microservice.
Take a look in tools like Jenkins (https://www.jenkins.io), DroneIO (https://drone.io)... Some organizations use Python scripts, or Go and so on... I, personally, do not like this approch, I think the best solution is to pick a tool from CNCF Landscape (https://landscape.cncf.io/zoom=150) in Continuous Integration & Delivery group, these are tools test and used in the market.
An alternate would be the micro-release strategy, where each microservice release in parallel through production through the CI/CD pipeline. But then would this mean that there would be as many pipelines as there are microservices?
It's ok in some tools you have a parameterized pipeline thats build projects based in received parameters, but I think the best solution is to have one pipeline per service, and some parameterized pipelines to deploy, or apply specific tests, archive assets and so on... Like you say micro-release strategy
Agreed, there is little information about this out there. From all I understand the approach to keep one pipeline per service sounds reasonable. With a growing amount of microservices you will run into several problems:
how do you keep track of changes in the configuration
how do you test your services efficiently with regression and integration tests
how do you efficiently setup environments
The key here is most probably that you make better use of parameterized environment variables that you then look to version in an efficient manner. This will allow you to keep track of the changes in an efficient manner. To achieve this make sure to a.) strictly paramterize all variables in the container configs and the code and b.) organize the config variables in a way that allows you to inject them at runtime. This is a piece of content that I found helpful in regard to my point a.);
As for point b.) this is slightly more tricky. As it looks you are using Kubernetes so you might just want to pick something like helm-charts. The question is how you structure your config files and you have two options:
Use something like Kustomize which is a configuration management tool that will allow you to version to a certain degree following a GitOps approach. This comes (in my biased opinion) with a good amount of flaws. Git is ultimately not meant for configuration management, it's hard to follow changes, to build diffs, to identify the relevant history if you handle that amount of services.
You use a Continuous Delivery API (I work for one so make sure you question this sufficiently). CDAPIs connect to all your systems (CI pipelines, clusters, image registries, external resources (DBs, file storage), internal resources (elastic, redis) etc. They dynamically inject environment variables at run-time and create the manifests with each deployment. They cache these as so called "deployment sets". Deployment Sets are the representation of the state of an environment at deployment time. This approach has several advantages: It allows you to share, version, diff and relaunch any state any service and application were in at any given point in time. It provides a very clear and bullet proof audit auf anything in the setup. QA environments or test-feature environments can be spun of through the API or UI allowing for fully featured regression and integration tests.

Puppet vs Ansible - why would organisation use both?

I have worked in an organisation where we used both puppet and ansible for configuration management... but I always wondered why would they use both tools ... what can puppet do that Ansible cannot do?
The only thought that came to my mind was:
- Puppet was used to check if the system is in the desired state at regular intervals; while Ansible was used to deploy one time things (code, scripts, packages etc)
Can someone please explain why would an organisation use both the tools? Can regular config check be done by Ansible?
Cheers
In the interest of full disclosure, I'm an upstream community contributing developer to Ansible but I will do my best to keep my response neutral.
I think this is largely opinionated and you'll get varied results depending on who you talk to but I think about it effectively like this:
Ansible is an automation tool and Puppet is a configuration management tool. I don't consider them to be direct competitors they way they seem to get compared by tech journalists except for the fact that there's some overlap in their abilities to perform the functions you would want out of a configuration management tool: service/system state, configuration file templating, application lifecycle management, etc.
The main place where I see these tools in completely different light is that Ansible performs automation of tasks, those tasks can be one of many "type" of things that you don't really expect from a configuration management tool, such as IaaS provisioning (AWS, GCE, Azure, RAX, Linode, etc), physical network configuration (Cisco IOS/ASA, JunOS, Arista, VyOS, Netscaler, etc), virtual machine creation/management, physical load balancer configuration (F5 BigIP) and the list goes on. Effectively, Ansible is your "automation glue" to create and automate a process that you and your team might have otherwise had to do by hand. It as a tool gets compared to things like Puppet, Chef, and SaltStack because one of the many "types" of task you would automate more or less add up to configuration management.
On the flip side though Configuration Management tools such as Puppet generally have a daemon running on the nodes, which needs to be provisioned/bootstrapped (maybe with Ansible), which has it's advantages and disadvantages (which I won't debate here, it's largely out of scope). One thing that daemon provides you is continuous eventual consistency. You can set configuration management authoritatively on the Puppet Master and then the agent will maintain that state on the systems and will provide reporting when it has to change something which can be wired up to alert monitoring to notify you when something's wrong. While Ansible will also report when something needed changing, it only does this when you run the Ansible Playbook. It's a push-model and not pull-model (nor is it a continuously running daemon that will enforce system state). This has it's advantages for reporting and the like. I will note that something like Ansible Tower/AWX can more or less emulate this functionality, but it's not a "baked in" feature. Just something to keep in mind.
Ultimately, I think it boils down to a matter of familiarity of technologies, desired feature set, and if you have a pre-existing investment (both time and money) into a toolchain. If you have been using Puppet for 5 years, there's no real motivation to fork-lift replace it with something else when you can use Ansible to augment it (there's even a puppet module in Ansible) and allow each to play nicely with each other, getting the features you want from both. However, if you're starting from scratch, then I think you may consider actually doing a Pros/Cons or feature comparison for what you really want out of the tool(s) to find out if it's worth the investment of picking up two tools from scratch or finding one that can fulfill all your needs and, while I'm biased towards Ansible in this regard, the choice ultimately lies on the person who's going to have to use the utility to maintain the infrastructure.
I think a good example of the hybrid approach is I know of a few companies that use Puppet for configuration management, and Ansible for software lifecycle release process where one of the tasks in their playbooks is literally calling the puppet module to bring all the systems into configuration consistency. The Ansible component in this is to automate/orchestrate between various systems, the basic outline of the process is this: start with removing a group of hosts from the load balancer, ensure database connections have stopped, perform upgrades/migrations, run puppet for configuration/state consistency, and then bring things back online in whatever order they've deemed appropriate. This all happens from a single command (or a click of a button in Tower/AWX).
Anyhoo, I know that was kind of long winded but hopefully it was helpful.

Heroku-like deployment and environment configuration via EC2

I really like the approach of a 12factor app, which you are kinda forced into, when you deploy an application to Heroku. For this question I'm particularly interested in setting environment variables for configuration, like one would do on Heroku.
As far as I can tell, there's no way to change the ENV for one or multiple instances within the EC2 console (though it's seems to be possible to set 5 ENV vars when using elastic beanstalk). Therefore my next bet on an Ubuntu based system would be to use /etc/environment, /etc/profile, ~/.profile or just the export command to set ENV variables.
Is this the correct approach or am I missing something?
And if so, is there a best practice on how to do it? I guess I could use something like Capistrano or Fabric, get a list of servers from the AWS api, connect to all of them and change the mentioned files/call export. Though 12factor is pretty well known, I couldn't find any blog post describing how to handle the ENV for a non-trivial amount of instances on EC2. And I don't want to implement such a thing, if somebody already did it very well and I just missed something.
Note: I want a solution without using elastic beanstalk and I don't care about git push deployment or any other Heroku-like feature, this is solely related to app configuration.
Any hints appreciated, thanks!
Good question. There are many ways you can approach your deployment/environment setup.
One thing to keep in mind is that with Heroku (or Elastic Beanstalk for that matter) you only push the code. Their service takes care of the scalability factor and replication of your services across their infrastructure (once you push the code).
If you are using fabric (or capistrano) you are using a push model too, but you have to take care of all the scalability/replication/fault tolerance of your application.
Having said that, if you are using EC2, in my opinion it's better if you leverage AMIs, Autoscale and Cloudformation for your deployments. This is the beauty of elasticity and Virtualization in that you can think of resources as ephemeral. You can still use fabric/capistrano to automate the AMI builds (I use Ansible) and configure environment variables, packages, etc. Then you can define a Cloudformation stack (with a JSON file) and in it you can add an autoscaling group with your prebaked AMI.
Another way of deploying your app is to simply use the AWS Opsworks service. It's pretty comprehensive and it has a lot of options but it may not be for everybody since some people may want a bit more flexibility.
If you want to go 'pull' model you can use Puppet, Chef or CFEngine. In this case you have a master policy server somewhere in the cloud (Puppetmaster, Chef Server or Policy Server). When a server gets spun up, an agent (Puppet agent, Chef Client, Cfengine agent) connects to its master to pick up its policy and then executes it. The policy may contains all the packages and environment variables that you need for your application to function. Again, it's a different model. This model scales pretty well but it depends on how many agents the master can handle and how you stagger the connections from the agents to the master. You can load balance multiple masters too if you want to scale to thousands of servers or you can just simply use multiple masters. From experience, if you want something really "fast" Cfengine works pretty good, there's a good blog comparing the speed of Puppet and CFengine here: http://www.blogcompiler.com/2012/09/30/scalability-of-cfengine-and-puppet-2/
You can also go "push" completely with tools like fabric, Ansible, Capistrano. However, you are constrained by how much a single server (or laptop) can handle multiple connections to thousands of servers that its trying to push to. This is also constrained by network bandwidth, but hey you can get creative and stagger your push updates and perhaps use multiple servers to push. Again it works and it's a different model so it depends which direction you want to go.
Hope this helps.
If you dont need beanstalk, you can look at AWS Opsworks (http://aws.amazon.com/opsworks/). Ideal for Web worker kind of deployment scenerios. You can pass any variable from outside the code here (even Chef recipies)
It's might be late but they what we are doing.
We have python script that take env var in Json and send that to as post data to another python script that convert those vars to ymal file.
After that we use Jenkins pipline groovy using multibranch. Jenkins do all the build and then code deploy copies those env vars to ec2 instanced running in autoscaling.
Off course we are doing some manapulation from yaml to simple text file so code deploy can paste it on /etc/envoirments

Which is better, Nagios or Sensu? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am unsure about which monitoring framework to use. Currently I am looking at either Nagios or Sensu.
Can anybody give me a good reference which shows a comparison of these two (or any other monitoring tool which may be a good solution)? My main intention is to scale-out on EC2. I am using Opscode Chef for system integration.
One important difference between Nagios and Sensu -
Nagios requires all the configuration for 1)checks 2)handlers but most importantly 3)hosts to be written in configuration files on the Nagios server. This means that each time one of the 3 above is changed (for example new hosts added, old hosts removed) you need to re-write the configuration files and restart Nagios.
Sensu is almost the same as the above, with one important difference -- when hosts are added or removed from your architecture (as is the case in most auto-scaling cloud deployments) -- the hosts themselves run a sensu-client that "subscribes" to different available checks. So when a new server comes into existence and says "I'm a webserver", the sensu-client running on it will ask the sensu-server "what checks should a webserver run on itself?" and run those.
Other than this, operations wise both Nagios (also Icinga) and Sensu are great and have a lot of facilities for checks, handlers, and visibility through a dashboard (YMMV).
From a little recent experience with Sensu and quite a bit of experience with Nagios I'd say both are excellent choices.
Sensu is definitely the new kid. It has a nice UI and nice API. It does however require Redis and RabbitMQ in your setup to work. So consider if you'll therefore want something to monitor those dependencies outside the sensu monitoring stack. Sonian provide Chef recipes for trying it out too.
https://github.com/sensu/sensu-chef
Nagios has been around for an awfully long time. It's generally packaged for most distros which makes installation simple and it has few dependencies. It's track record also means that finding people who know it or that have used it and can offer advice is easy. On the other hand the UI is ugly and programatic access is often hacky or via third party add-ons. Chef recipes also exist for Nagios:
https://github.com/bryanwb/chef-nagios
If you have time I'd try both, there is little harm in having two monitoring systems running as a trial. The main think to focus on, especially in a dynamic EC2 setup, is how easily the monitoring configuration files can be generated by your configuration management tool.
In terms of other tools I'd personally include something to record time series data, for instance requests per second or load over time. Graphs are a great help with monitoring, and can be used to drive alerting via Nagios or similar. Personally I'm a fan of both Ganglia and Graphite while Librato Metrics (https://metrics.librato.com/) is a very nice non-free option.
I tried using Nagios for a while: I got the feeling that the only reason that it's common is that 'everyone else uses it', because it's absolutely hideous to work with. Massively overcomplicated, difficult and long-winded to make it do anything new: if you find something it doesn't do, you know you're in for a week of swearing at crummy documentation of an archaic design. At the end of all your efforts and it's all working, it looks hideous. Scrapping it made me sleep better.
Cacti looks nice, but again it's unnecessarily complex when creating new plugins.
For graphing I'd recommend Munin: it's completely trivial to write new plugins in any language, there are hundreds available, and it looks reasonable. It's incredibly easy to install - one command to install and set one access rule, so works well for automated deployments, easy to wrap into a chef recipe. 2.0 is out soon and addresses most of its shortcomings (in particular adding variable update intervals, zoomable graphs, ssh transport). Munin can talk to Nagios for notifications, or it can do that itself, and it provides a basic dashboard.
For local process/file/service monitoring, monit is simpler and works better than god. I've not tried it with m/monit.
When compared with Sensu and Nagios... The pick would be Sensu monitoring systems.
Below is the are the main reasons,
1.Easy Setup.. There is lot of reduction of restarting of Clients.. which is major trouble in the large enterprise
2. Nagios Plugins can be used with the Sensu Ecosystem.
3. Scalable and easily for the Cloud environment.
Has anyone heard about Zabbix.It has lot many features and comes as a single package. I doubt the scalability
As long as enterprise it consists of databases, sap, network devices, webservers, filers, backup libraries.... there is barely an alternative to nagios (or it's cousins icinga, shinken)
Maybe one day everything will come out of clouds automagically but still a few years there will be static servers (physical or virtual, it doesn't matter) with a defined purpose resting at least for a few months. We will still have to monitor interface bandwidth, tablespaces, business processes, database sessions, logfiles, jmx metrics. All things where the plugin concept of the nagios world has an advantage.

Resources