I am trying to run a spark streaming app through marathon on mesos and this job eventually stores some counts into an instance of cassandra. My question is should I set number of instances (on marathon) for this app to 2 (for HA); however, the issue is wouldn't the 2nd instance be just a replica of the first one and processing and results would be duplicated?
No you don't set the number of instances to 2 for HA. Marathon will re-start any app that due to whatever reasons has gone down. It is a good practice to implement health checks, though.
Related
We are using a quartz scheduler for scheduling a few jobs.
We are multiple VMs so we have configured quartz in clustered mode using MySQL job store.
Now we are migrating to Kubernetes. So there will be multiple pods that will handle all the incoming requests, but there will be worker pods that will only run these schedulers. The problem is we don't want these main pods to execute these quartz jobs.
I was thinking of disabling schedulers for these main pods and enabling only for worker pods, In that way we can achieve what is required.
So the question is if we disable or put schedulers on standby for main pods will it also put other (workers) instances schedulers also on standby as all of them are running in cluster mode?
If anyone has a better solution, Please help.
I don't have any solution in my mind at all.
not even a workaround.
If we don't find the solution then we have to stick with our older VMs and cannot migrate to Kubernetes.
I have an issue that from time to time one of the EC2 instances within my cluster have its ECS-agent disconnected. This silently removes the EC2 instance from the cluster (i.e. not eligible to run any services anymore) and silently drains my cluster from serving servers. I have my cluster backed with an autoscaling group, spawning servers to keep up the healthy amount. But the ECS-agent'disconnected servers are not marked as unhealthy, so the AS-group thinks everything is alright.
I have the feeling there must be something (easy) to mitigate this, or I'm having a big issue with choosing ECS and using it in production.
We had this issue for a long time. With each new AWS ECS-optimized AMI it got better, but as of 3 months ago it still happened from time to time. As mcheshier mentioned make sure to always use the latest AMI or at least the latest aws ecs agent
The only way we were able to resolve it was through:
Timed autoscale rotations
We would try to prevent it by scaling up and down at random times
Good cloudwatch alerts
We happened to have our application set up as a bunch of microservices that were all queue (SQS) based. We could scale up and down based on queues. We had decent monitoring set up that let us approximate rates of queues across number of ECS containers. When we detected that the rate was off we would rotate that whole ECS instance. Ie. Say our cluster deployed 4 running containers of worker-1. We approximate that each worker does 1000 messages per 5 minutes. If our queue rate was 3000 per 5 minutes and we had 4 workers, then 1 was not working as expected. We had some scripts set up in lambda to find the faulty one and terminate the entire instance that ran that container.
I hope this helps, I realize it's specific to our in-house application, but the advice I can give you and anyone else is to take the initiative and put as many metrics out there as you can. This will let you do some neat analytics and look for kinks in the system, this being one of them.
I've been playing with Mesos cluster for a little bit, and thinking of utilizing Mesos cluster in our production environment. One problem I can't seem to find an answer to: how to properly schedule long running apps that will have varying load?
Marathon has "CPUs" property, where you can set weight for CPU allocation to particular app. (I'm planning on running Docker containers) But from what I've read, it is only a weight, not a reservation, allocation, or limitation that I am setting for the app. It can still use 100% of CPU on the server, if it's the only thing that's running. The problem is that for long running apps, resource demands change over time. Web server, for example, is directly proportional to the traffic. Coupled to Mesos treating this setting as a "reservation," I am choosing between 2 evils: set it too low, and it may start too many processes on the same host and all of them will suffer, with host CPU going past 100%. Set it too high, and CPU will go idle, as reservation is made (or so Mesos think), but there is nothing that's using those resources.
How do you approach this problem? Am I missing something in how Mesos and Marathon handle resources?
I was thinking of an ideal way of doing this:
Specify weight for CPU for different apps (on the order of, say, 0.1 through 1), so that when going gets tough, higher priority gets more (as is right now)
Have Mesos slave report "Available LA" with its status (e.g. if 10 minute LA is 2, with 8 CPUs available, report 6 "Available LA")
Configure Marathon to require "Available LA" resource on the slave to schedule a task (e.g. don't start on particular host if Available LA is < 2)
When available LA goes to 0 (due to influx of traffic at the same time as some job was started on the same server before the influx) - have Marathon move jobs to another slave, one that has more "Available LA"
Is there a way to achieve any of this?
So far, I gather that I can possible write a custom isolator module that will run on slaves, and report this custom metric to the master. Then I can use it in resource negotiation. Is this true?
I wasn't able to find anything on Marathon rescheduling tasks on different nodes if one becomes overloaded. Any suggestions?
As of Mesos 0.23.0 oversubscription is supported. Unfortunately it is not yet implemented in Marathon: https://github.com/mesosphere/marathon/issues/2424
In order to dynamically do allocation, you can use the Mesos slave metrics along with the Marathon HTTP API to scale, for example, as I've done here, in a different context. My colleague Niklas did related work with nibbler, which might also be of help.
I would like to use Apache Marathon to manage resources in a clustered product. Mesos and Marathon solves some of the "cluster resource manager" problems for additional components that need to be kept running with HA, failover, etc.
However, there are a number of services that need to be kept running to keep mesos and marathon running (like zookeeper, mesos itself, etc). What can we use to keep those services running with HA, failover, etc?
It seems like solving this across a cluster (managing how many instances of zookeeper, etc, and where they run and how they fail over) is exactly the problem that mesos/marathon are trying to solve.
As the Mesos HA doc explains, you can start multiple Mesos masters and let ZK elect the leader. Then if your leading master fails, you still have at least 2 left to handle things. It is common to use something like systemd to automatically restart the mesos-master on the same host if it's still healthy, or something like Amazon AutoScalingGroups to ensure you always have 3 master machines even if a host dies.
The same can be done for Marathon in its HA mode (on by default if you start multiple instances pointing to the same znode). Many users start these on the same 3 nodes as their Mesos masters, using systemd to restart failed Marathon services, and the same ASG to ensure there are 3 Mesos/Marathon master nodes.
These same 3 nodes are often configured to be the ZK quorum as well, so there are only 3 nodes you have to manage for all these services running outside of Mesos.
Conceivably, you could bootstrap both Mesos-master and Marathon into the cluster as Marathon/Mesos tasks. Spin up a single Mesos+Marathon master to get the cluster started, then create a Mesos-master app in Marathon to launch 2-3 masters as Mesos tasks, and a Marathon-master app in Marathon to launch a couple of HA Marathon instances (as Mesos tasks). Once those are healthy, you can kill the original standalone Mesos/Marathon master and the cluster would failover to the self-hosted Mesos and Marathon masters, which would be automatically restarted elsewhere on the cluster if they failed. Maybe this would work with ZK too. You'd probably need something like Mesos-DNS and/or ELB to let other services find Mesos/Marathon. I doubt anybody's running Mesos this way, but it's crazy enough it just might work!
In order to understand this, I suggest you spend a few minutes reading up on the architecture and the HA part in the official Mesos doc. There, it is clearly explained how HA/failover in Mesos core is handled (which is, BTW, nothing magic—many systems I know of use pretty much exactly this model, incl. HBase, Storm, Kafka, etc.).
Also, note that—naturally—the challenge keeping a handful of the Mesos masters/Zk alive is not directly comparable with keeping potentially 10000s of processes across a cluster alive, evict them or fail them over (in terms of fan out, memory footprint, throughput, etc.).
Is one Zookeeper installation good enough to be used by Hadoop Kafka and Storm clusters?
I want to deploy all on one test environment and try playing with those technologies,
can I use one zookeeper installation for that? same znode could be dedicated for number of services?
Yes, you can use a single zookeeper installation to support more than one cluster and indeed different types of clusters. This has been the case for a long time - here's a link to a good discussion on it from 2009: http://zookeeper-user.578899.n2.nabble.com/Multiple-ZK-clusters-or-a-single-shared-cluster-td3277547.html
For testing this is fine (and even to run it on one ZK server). For production use though you'll want at least a 3 node cluster. And you should think carefully about running everything off of a single cluster.
The reason is that if you run multiple Hadoop, Storm and Kafka clusters off of a single ZK cluster, that one set of servers becomes a single point of failure for all of your distributed systems. You can beef up the ZK setup with more than 3 servers (let's say 7) so that it can handle multiple failures, but if someone were to accidentally bring ZK down all your distributed environments would come down too.
Some argue that you would be better off with more isolation between systems. I think it varies by use case but I'd be careful about putting all of your eggs in one ZK basket.