I've just learned how to use notifications and subscriptions in Chef to carry out actions such as restarting services if a config file is changed.
I am still learning chef so may just have not got to this section yet but I'd like to know how to do the actions conditionally.
Eg1 if I change a config file for my stand alone apache server I only want to restart the service if we are outside core business hours ie the current local time is between 6pm and 6am. If we are in core business hours I want the restart to happen but at a later time, outside core hours.
Eg2 if I change a config file for my load balanced apache server cluster I only want restart the service if a) the load balancer service status is "running" and b) all other nodes in the cluster have their apache service status as running ie I'm not taking down more than one node in the cluster at once.
I imagine we might need to put the action in a ruby block that either loops until the conditions are met or sets a flag or creates a scheduled task to execute later but I have no idea what to look for to learn how best to do this.
I guess this topic is kind of philosophical. For me, Chef should not have a specific state or logic beyond the current node and run. If I want to restart at a specific time, I would create a cron job with a conditional and just set the conditional with chef (Something like debian's /var/run/reboot-required). Then crond would trigger the reboot.
For your second example, the LB should have no issues to deal with a restarting apache backend and failover to another backend. Given that Chef runs regulary with something called "splay" the probability is very low that no backend is reachable. Even with only 2 backends. That said, reloading may be the better way.
Related
I am currently setting up a POC network of dockerised services using DC/OS. As this is a POC, I have had to make many revisions to the containers in order to get things working.
Consequently, when drilling into some services using the DC/OS (v1.8.7) web UI, I can see hundreds of old tasks - the vast majority of which have a status of 'finished'.
I realise that I can filter out and just see the 'active' containers by clicking on the appropriate tab, but this is not what I am after because I would like to be able to see when a container is staging, and also - all the history for the finished tasks is worthless to me.
How do I purge DC/OS of these finished tasks, as they are clogging up the UI?
Is there a CLI command for this, or do I have to clear out stuff on the master nodes... or is there an handy plug-in that will manage this for me? I've been looking around both on the web and the master nodes, but can't work out what I need to do.
In DC/OS 1.8.x there is no UI or CLI method to influence garbage collection. You can however, with a custom install, influence some parameter, such as gc_delay (default value: 2 days in DC/OS) and others are using the Mesos defaults, like gc_disk_headroom (which is unchanged set to 0.1, which means, Mesos targets to have 10% of the assigned disk as free space).
For parameters you can change at install time see the Install Configuration Parameters docs for more details.
I have a Laravel application where the Application servers are behind a Load Balancer. On these Application servers, I have cron jobs running, some of which should only be run once (or run on one instance).
I did some research and found that people seem to favor a lock-system, where you keep all the cron jobs active on each application box, and when one goes to process a job, you create some sort of lock so the others know not to process the same job.
I was wondering if anyone had more details on this procedure in regards to AWS, or if there's a better solution for this problem?
You can build distributed locking mechanisms on AWS using DynamoDB with strongly consistent reads. You can also do something similar using Redis (ElastiCache).
Alternatively, you could use Lambda scheduled events to send a request to your load balancer on a cron schedule. Since only one back-end server would receive the request that server could execute the cron job.
These solutions tend to break when your autoscaling group experiences a scale-in event and the server processing the task gets deleted. I prefer to have a small server, like a t2.nano, that isn't part of the cluster and schedule cron jobs on that.
Check out this package for Laravel implementation of the lock system (DB implementation):
https://packagist.org/packages/jdavidbakr/multi-server-event
Also, this pull request solves this problem using the lock system (cache implementation):
https://github.com/laravel/framework/pull/10965
If you need to run stuff only once globally (so not once on every server) and 'lock' the thing that needs to be run, I highly recommend using AWS SQS because it offers exactly that: run a cron to fetch a ticket. If you get one, parse it. Otherwise, do nothing. So all crons are active on all machines, but tickets are 'in flight' when some machine requests a ticket and that specific ticket cannot be requested by another machine.
I'm building a monitoring service similar to pingdom but monitoring different aspects of a system and using sidekiq to queue the tasks which is working well. What I need to do is to schedule sending out pings every minute, rather than using a cron based system which would require spinning up a new ruby instance every minute I have gone down the route of using sidetiq (notice the different spelling with a "t") which uses sidekiq's own queue to schedule future tasks. This feels like a neat solution, however I am concerned this may not be the most reliable way of scheduling tasks? If there are issues with the system (as there inevitable will be at some point) will this method of scheduling tasks be less reliable than using a cron based method and why?
Thanks
You are giving too short description of your system needs but I'll try to guess how it could be:
In the first place using sidekiq means that you'll also need an instance of redis and also means that you'll need a way to monitor the sidekiq process and restart it in case of failure and possibly redis server.
A method based on cron tasks will have fewer requirements therefore much less possibilities of failing.
cron has been around for a long time and it's battle tested and it's very very reliable, but has it's drawbacks too.
Said that, you can build a system with separate instances of redis in a master/slave configuration and you can also use Redis sentinel to implement a failover in case of the master failure, implement a monitoring/alerting system on this setup (you can use something super simple like this http://contribsys.com/inspeqtor/ from the sidekiq author) and you can also start several instances of sidekiq in different machines.
With all of that, you can have a quite reliable system for running sidekiq with sidetiq.
Hope it helps
I'm currently making a watchdog to check if all bundles in a pipeline are still functioning properly. (This will be in a distributed environment so failure can be a network failure, software failure, one of the servers failing, ...)
Because a bundle can be bound to N amount of services, N arbitrary, the checking should will happen recursively using the following methodology:
START at the first step in the pipeline
Use getServicesInUse to get the services references of the next step
use getBundle() on the gathered ServiceRerefence objects
REPEAT until we arrive at the bundle we want to stop at
So that way I can get all the bundle objects of the pipeline (I assume) now to check if they are functioning correctly (or just if they are still reachable) I was wondering if
Bundle b = ...
if(b.getState() == Bundle.ACTIVE) ...;
will do the trick? Ofcourse also surrounding this with the necessary try catch clauses to detect hardware/network failure.
Can you clarify what you mean by "all bundles in a pipeline"?
You are right that a bundle can provide and consume zero or more services, but if I were to create a watchdog for an OSGi system I would use one of two approaches:
If the nodes in your distributed system provide mainly REST services, I would write a separate "watchdog" program that monitors these REST services to see if they still respond (on any of the nodes in my distributed system). You can either make "real" calls or just request some HEAD and see if you get a response.
If the nodes in your distributed system provide mainly OSGi services, I would write a watchdog bundle and deploy that to each node. I would then add a REST endpoint to my watchdog to allow me to monitor it remotely (by another watchdog, similar to approach #1).
Checking the active state of a bundle will tell you nothing. Bundles will remain active once started, but the services they provide could be unresponsive.
I'm trying to put a new version of my webserver (which runs as a binary) on an amazon ec2 instance. The problem is that I have to shut the process down each time to do so. Does anyone know a workaround where I could upload it while the process is still running?
Even if you could, you don't want to. What you want to do is:
Have at least 2 machines running behind a load balancer
Take one of them out of the LB pool
Shutdown the processes on it
Replace them (binaries, resources, config, whatever)
Bring them back up
Then put it back in the pool.
Do the same for the other machine.
Make sure your chances are backward compatible, as there will be a short period of time when both versions run concurrently.