Modifying number of tasks executed on mesos slave - mesos

In a Mesos ecosystem(master + scheduler + slave), with the master executing tasks on the slaves, is there a configuration that allows modifying number of tasks executed on each slave?
Say for example, currently a mesos master runs 4 tasks on one of the slaves(each task is using 1 cpu). Now, we have 4 slaves(4 cores each) and except for this one slave the other three are not being used.
So, instead of this execution scenario, I'd rather prefer the master running 1 task on each of the 4 slaves.
I found this stackoverflow question and these configurations relevant to this case, but still not clear on how to use the --isolation=VALUE or --resources=VALUE configuration here.
Thanks for the help!

Was able to reduce number of tasks being executed on a single host at a time by adding the following properties to startup script for mesos agent.
--resources="cpus:<<value>>" and --cgroups_enable_cfs=true.
This however does not take care of the concurrent scheduling issue where the requirement is to have each agent executing a task at the same time. For that need to look into the scheduler code as also suggested above.

Related

Arrays and user job limitations on SLURM?

I'm a new SLURM user and I'm trying to figure out the best way to submit a job that requires the same command to run 400,000 times with different input files (approximately 200MB memory per CPU, 4 minutes for one instance, each instance runs independently).
I read through the documentation, and so far it seems that arrays are the way to go.
I can use up to 3 nodes on my HPC with 20 cores each, which means that I could run 60 instances of my command at the same time. However, user limit for jobs running at the same time is 10 jobs, with 20 jobs in the queue.
So far, everything I've tried runs each instance of the command as a separate job, thus limiting it to 10 instances in parallel.
How can I fully utilize all available cores in light of the job limits?
Thanks in advance for your help!
You can have a look at tools like GREASY that will allow you to run a single Slurm job and spawn multiple subtasks.
The documentation specifies how to install it and use it and can be found here
You don't even need the job array to attain the defined objective. Firstly submit a job via sbatch job_script command, in the job_script you can customise the job submission. You can use srun parameters & along with the for loop to run the maximum jobs.

Limit number of concurrent map tasks in Hadoop 2.6.0

I want to restrict number of simultaneously map tasks running at one slave node.
In my case, when I submit my job, Hadoop generate 8 map tasks, when I looked at Job history UI at port 19888, I always saw that 8 map tasks started at the same time at same slave node.
Even I tried to set this attribute mapreduce.tasktracker.map.tasks.maximum equal to 4 (from how to restrict the concurrent running map tasks?). It still didn't work for me.
Does anyone have a experiment to deal with this issue?
for hadoop 2.6 it should be newer API - see this mapred-default.xml
set mapreduce.tasktracker.map.tasks.maximum to 4
The maximum number of map tasks that will be run simultaneously by a task tracker.

How can you run more than one simultaneous job in Ansible Tower?

It seems that all jobs are enqueued, and only one will run at a time. How can we run more than one?
Tower is designed to parallelize jobs, but there are a couple of cases where it will not.
If you have your inventory or SCM set up to "update on launch" with no cache or the cahche has expired, then any additional jobs will be stuck pending behind the inventory or SCM update. The inventory and SCM will not update until after the currently running job is done.
If you are trying to run multiple jobs against the same host: Tower will not run multiple jobs against the same host at the same time in order to avoid race conditions. (localhost is a possible exception). If you need multiple jobs to run against the same host at the same time then you need to create two inventories and put that host in both inventories, running the two jobs against different inventories. In this situation, Tower does not know that you are running against the same host.
Jobs which share the same Inventory or SCM source can not run at the same time.
Suppose you have a job comprised of three tasks:
task 1: "do x", task 2: "do y", task 3: "do z"
With ansible "do x" will run on all the servers, then "do y" will run on all the servers, then "do z" will run on all the servers.
Also, I said "all serves" but in fact it maxes out at the ansible "forks" value, which defaults to 5. In my 100 server enviroment I set this value to 20. more on this here: http://docs.ansible.com/intro_configuration.html#forks
Remember the strength of ansible is doing a job ( a collection of tasks) on many machines at the same time. If what you want is to run the same task many times on a single machine, then you want something like fork, or parallel.
In fact Ansible will try to run "do x" as many times as it can across many machines. You can adjust this behavior having the whole job run on a portion of machines before it gets started on more machines with the "serial" keyword (http://docs.ansible.com/playbooks_delegation.html#rolling-update-batch-size).
Not the subtle difference between forks, and serial.
forks is "per task"
serial is "per job" ( collection of tasks )
David Thornton
Edit:I re-read your question. This is about running more than one job at a time, not running more than on task in a job. So I think you are correct for ansible-awx but not for the command line. Via the web interface you can submit a job to the job queue, but you can't make ansible-awx run more than one task at a time. I think. However via command line, if you open more than one window you can run multiple ansible-playbooks at the same time. Do you have an ansible support account? Those guys are great IMHO, they have taken a lot of time to answer my questions ( like your question ).
Simultaneous jobs can be executed from Tower. Job templates have "Enable Concurrent Jobs" option. See section "15.4. Job Concurrency" at http://docs.ansible.com/ansible-tower/latest/html/userguide/jobs.html.
If i have 3 different tasks on a single server running its called synchronous mode management, 3 tasks will be assigned to a single job ID , and each tasks executes one after the other were it consumes lots of time.
In Ansible version later than 2.5 we can get 3 job ID for 3 different tasks , and start executing at a same time were we can save a huge time.This type is called asynchronous mode.

How to find the right portion between hadoop instance types

I am trying to find out how many MASTER, CORE, TASK instances are optimal to my jobs. I couldn't find any tutorial that explains how do I figure it out.
How do I know if I need more than 1 core instance? What are the "symptoms" I would see in EMR's console in the metrics that would hint I need more than one core? So far when I tried the same job with 1*core+7*task instances it ran pretty much like on 8*core, but it doesn't make much sense to me. Or is it possible that my job is so much CPU bound that the IO is such minor? (I have a map-only job that parses apache log files into csv file)
Is there such a thing to have more than 1 master instance? If yes, when is it needed? I wonder, because my master node pretty much is just waiting for the other nodes to do the job (0%CPU) for 95% of the time.
Can the master and the core node be identical? I can have a master only cluster, when the 1 and only node does everything. It looks like it would be logical to be able to have a cluster with 1 node that is the master and the core , and the rest are task nodes, but it seems to be impossible to set it up that way with EMR. Why is that?
The master instance acts as a manager and coordinates everything that goes in the whole cluster. As such, it has to exist in every job flow you run but just one instance is all you need. Unless you are deploying a single-node cluster (in which case the master instance is the only node running), it does not do any heavy lifting as far as actual MapReducing is concerned, so the instance does not have to be a powerful machine.
The number of core instances that you need really depends on the job and how fast you want to process it, so there is no single correct answer. A good thing is that you can resize the core/task instance group, so if you think your job is running slow, then you can add more instances to a running process.
One important difference between core and task instance groups is that the core instances store actual data on HDFS whereas task instances do not. In turn, you can only increase the core instance group (because removing running instances would lose the data on those instances). On the other hand, you can both increase and decrease the task instance group by adding or removing task instances.
So these two types of instances can be used to adjust the processing power of your job. Typically, you use ondemand instances for core instances because they must be running all the time and cannot be lost, and you use spot instances for task instances because losing task instances do not kill the entire job (e.g., the tasks not finished by task instances will be rerun on core instances). This is one way to run a large cluster cost-effectively by using spot instances.
The general description of each instance type is available here:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/InstanceGroups.html
Also, this video may be useful for using EMR effectively:
https://www.youtube.com/watch?v=a5D_bs7E3uc

How quartz detect nodes fails

My production environment running a java scheduler job using quartz 2.1.4. on weblogic cluster server with 4 machine and only one schedule job execute at one cluster node (node 1) normally for few months, but node 2 sudden find the node 1 fail at take over the executing job last night. In fact, the node 1 without error (according to the server, network, database, application log), this event caused duplicate message created due to 2 process concurrent execute.
What is the mechanism of quartz to detect node fails? By ping scan, or heart beat ping via UCP broadcast, or database respond time other? Any configuration on it?
I have read the quartz configuration guide
http://quartz-scheduler.org/documentation/quartz-2.1.x/configuration/ConfigJDBCJobStoreClustering
, but there is no answer.
I am using JDBCJobstore. After details checking, we found that there is a database (Oracle) statement executing abnormal long (from 5 sec to 30 sec). The incident happened on this period of time. Do you think it related?
my configuration is
`
org.quartz.threadPool.threadCount=10
org.quartz.threadPool.threadPriority=5
org.quartz.jobStore.misfireThreshold = 10000
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
`
Anyone have this information? Thanks.
I know the answer is very late, but maybe somebody like both of us will still need it.
Short version: it is all handled by DB. Important property would be org.quartz.jobStore.clusterCheckinInterval.
Long version (all credits go to http://flylib.com/books/en/2.65.1.91/1/ ) :
Detecting Failed Scheduler Nodes
When a Scheduler instance performs the check-in routine, it looks to
see if there are other Scheduler instances that didn't check in when
they were supposed to. It does this by inspecting the SCHEDULER_STATE
table and looking for schedulers that have a value in the
LAST_CHECK_TIME column that is older than the property
org.quartz.jobStore.clusterCheckinInterval (discussed in the next
section). If one or more nodes haven't checked in, the running
Scheduler assumes that the other instance(s) have failed.
Additionally the next paragraph might also be important:
Running Nodes on Separate Machines with Unsynchronized Clocks
As you can ascertain by now, if you run nodes on different machines and the
clocks are not synchronized, you can get unexpected results. This is
because a timestamp is being used to inform other instances of the
last time one node checked in. If that node's clock was set for the
future, a running Scheduler might never realize that a node has gone
down. On the other hand, if a clock on one node is set in the past, a
node might assume that the node has gone down and attempt to take over
and rerun its jobs. In either case, it's not the behavior that you
want. When you're using different machines in a cluster (which is the
normal case), be sure to synchronize the clocks. See the section
"Quartz Clustering Cookbook," later in this chapter for details on how
to do this.

Resources