I have been comparing Marathon with Aurora.
Marathon:
Easy to use
Light weight
More active to add new features
Aurora:
More feature sets, and more flexible object configurations
Heavy and difficult to use
More conservative adding new features
Right now, we can not decide which one to use. Here are several questions hope can be answered by someone:
How to handle group of tasks in Aurora? Grouping is supported in Marathon, a group of tasks can be managed together, but I cannot find grouping in Aurora.
How to config environment variables for processes/tasks in Aurora?
How to add event handler for status updates in Aurora? We would like to implement customized alerting. It is possible in Marathon.
Aurora is a great scheduler and is very solid, especially the templating and it's DSL makes aurora very powerful. I had a few gists ready and docs as an example but i can only post 2 url's here.
Aurora supports processes, tasks and jobs. Usually when we wish to have tasks executed on the same host we place these in processes, its also possible to combine or concat tasks. Aurora has excellent support for executing processes, they can be done sequential or parallel.
An example can be found here:
http://aurora.apache.org/documentation/latest/reference/configuration-tutorial/#sequentialtask-running-processes-in-parallel-or-sequentially
Another option is to run tasks combined.
{{replace with the aurora website}}/documentation/latest/reference/configuration-tutorial/#combining-tasks)
This is recently added to the documentation on how to add environment variables. For now you can check this link until its published on the website:
https://github.com/apache/aurora/blob/2a8c667ec1b48900530975169f132d9deb098399/docs/reference/configuration-tutorial.md#getting-environment-variables-into-the-sandbox
Currently to my knowledge there are no event handlers for status updates, perhaps someone else can answer this question better than me. An option that comes close:
What you can do is add a finalizing task that sends out an alert or triggers an event. This task will run after all processes are stopped, you can run multiple finalizing tasks. Do not have an direct link, but if you search for final on this page you find more:
{{replace with the aurora website}}/documentation/latest/reference/configuration/
And just saw this was a year ago, but perhaps someone else will be happy with this.
Related
I need a better way to use my alerting code.Right now I have a code that check for space free on aws ecs and sends a simple notification to slack if space is less than 5gb using slack api.I used this code in jenkins and setup a periodic schedule to run every 15 min.But once the notification is triggered I wanted it to stop the check for 4 hours so, it won't fill the slack channel with messages .So, i used sleep 14400 after condition is triggered.But this leaves an executor of jenkins waiting.Is there a better way to do this?
If you really want a better way, you should use better tools. there are many tools (some free) out there, that can monitor something in a stateful manner (for example, using a daemon).
Writing to log (or slack channel) in this context of using Jenkins is sort of stateless, for example you cannot check whether an alarm is currently triggered or not.
Since you cannot check if an alarm is already triggered - using jenkins with the logic you requested in your question ('snooze feature') can be very ugly.
In general I would recommend using Conditional BuildStep to trigger a step if a condition is met (i.e. if alarm not already triggered), but since there is no way for you to poll this information, or achieve this with Jenkins without the solution being 'hackish' like creating a file to indicate alert is on, and deleting it from another job if it was created more than 4 hrs ago - I would suggest looking at tools more suitable for the job.
I am currently setting up a POC network of dockerised services using DC/OS. As this is a POC, I have had to make many revisions to the containers in order to get things working.
Consequently, when drilling into some services using the DC/OS (v1.8.7) web UI, I can see hundreds of old tasks - the vast majority of which have a status of 'finished'.
I realise that I can filter out and just see the 'active' containers by clicking on the appropriate tab, but this is not what I am after because I would like to be able to see when a container is staging, and also - all the history for the finished tasks is worthless to me.
How do I purge DC/OS of these finished tasks, as they are clogging up the UI?
Is there a CLI command for this, or do I have to clear out stuff on the master nodes... or is there an handy plug-in that will manage this for me? I've been looking around both on the web and the master nodes, but can't work out what I need to do.
In DC/OS 1.8.x there is no UI or CLI method to influence garbage collection. You can however, with a custom install, influence some parameter, such as gc_delay (default value: 2 days in DC/OS) and others are using the Mesos defaults, like gc_disk_headroom (which is unchanged set to 0.1, which means, Mesos targets to have 10% of the assigned disk as free space).
For parameters you can change at install time see the Install Configuration Parameters docs for more details.
I have a Laravel application where the Application servers are behind a Load Balancer. On these Application servers, I have cron jobs running, some of which should only be run once (or run on one instance).
I did some research and found that people seem to favor a lock-system, where you keep all the cron jobs active on each application box, and when one goes to process a job, you create some sort of lock so the others know not to process the same job.
I was wondering if anyone had more details on this procedure in regards to AWS, or if there's a better solution for this problem?
You can build distributed locking mechanisms on AWS using DynamoDB with strongly consistent reads. You can also do something similar using Redis (ElastiCache).
Alternatively, you could use Lambda scheduled events to send a request to your load balancer on a cron schedule. Since only one back-end server would receive the request that server could execute the cron job.
These solutions tend to break when your autoscaling group experiences a scale-in event and the server processing the task gets deleted. I prefer to have a small server, like a t2.nano, that isn't part of the cluster and schedule cron jobs on that.
Check out this package for Laravel implementation of the lock system (DB implementation):
https://packagist.org/packages/jdavidbakr/multi-server-event
Also, this pull request solves this problem using the lock system (cache implementation):
https://github.com/laravel/framework/pull/10965
If you need to run stuff only once globally (so not once on every server) and 'lock' the thing that needs to be run, I highly recommend using AWS SQS because it offers exactly that: run a cron to fetch a ticket. If you get one, parse it. Otherwise, do nothing. So all crons are active on all machines, but tickets are 'in flight' when some machine requests a ticket and that specific ticket cannot be requested by another machine.
I setup a Mesos cluster which runs Apache Aurora framework, and i registered 100 cron jobs which run every min on a 5 slave machine pool. I found after scheduled 100 times, the cron jobs stacked in "PENDING" state. May i ask what kind of logs i can inspect and what is the possible problem ?
It could be a couple of things:
Do you still have sufficient resources in your cluster?
Are those resources offered to Aurora? Or maybe only to another framework?
Do you have any task constraints that prevent your tasks from being scheduled?
Possible information source:
What does the tooltip or the expanded status say on the UI? (as shown in the screenshot)
The Aurora scheduler has log files. However normally those are not needed for an end user to figure out why stuff is stuck in pending.
In case you are stuck here, it would probably be the best to drop by in the #aurora IRC channel on freenode.
I'm building a monitoring service similar to pingdom but monitoring different aspects of a system and using sidekiq to queue the tasks which is working well. What I need to do is to schedule sending out pings every minute, rather than using a cron based system which would require spinning up a new ruby instance every minute I have gone down the route of using sidetiq (notice the different spelling with a "t") which uses sidekiq's own queue to schedule future tasks. This feels like a neat solution, however I am concerned this may not be the most reliable way of scheduling tasks? If there are issues with the system (as there inevitable will be at some point) will this method of scheduling tasks be less reliable than using a cron based method and why?
Thanks
You are giving too short description of your system needs but I'll try to guess how it could be:
In the first place using sidekiq means that you'll also need an instance of redis and also means that you'll need a way to monitor the sidekiq process and restart it in case of failure and possibly redis server.
A method based on cron tasks will have fewer requirements therefore much less possibilities of failing.
cron has been around for a long time and it's battle tested and it's very very reliable, but has it's drawbacks too.
Said that, you can build a system with separate instances of redis in a master/slave configuration and you can also use Redis sentinel to implement a failover in case of the master failure, implement a monitoring/alerting system on this setup (you can use something super simple like this http://contribsys.com/inspeqtor/ from the sidekiq author) and you can also start several instances of sidekiq in different machines.
With all of that, you can have a quite reliable system for running sidekiq with sidetiq.
Hope it helps