Limiting the number of qsub jobs to under the job limit - bash

I am trying to do parameter tuning of my learning model on a Bright compute Cluster, which requires a large number of jobs due to the number of parameters being tuned. Each combination of the parameters requires around 162 qsub jobs. And there are around 50 combinations of parameters that I require to check. This is equivalent to running around 162*50 ~= 8100 jobs. However there is a 350 qsub job limit per account on the cluster that I am using. I was hence wondering whether there was a way in bash scripting to check the number of currently active qsub jobs so I could effectively automate the process of initiating new jobs.

Did you already try with job arrays? You didn't specify the scheduler you are using (PBS, OGE, ...), but there should be a way to define a job array and, in the whole array, a limit on the number of tasks really running at a time. In PBS
#PBS -t 1-1000%100
creates a one thousand job array limiting to one hundred the number of tasks effectively running at a time.
If you really want to find a way to check active jobs to automate the process of initiating new jobs, the qstat output should help you, but this should be the duty of your scheduler, not your.

Related

How to I start a group of jobs on an SGE grid that will all start at the same time?

I am using a tool which spawns parallel jobs onto an SGE grid. When the grid is close to capacity some of the grid jobs start but others do not. This leads to wildly different results from the tool depending on the actual number of jobs that start.
So how can I get SGE to queue until all the grid slots are available and then start all the jobs at once?
Note that the grid slots will typically become available across a number of machines.
From the qsub man page:
-tcon y[es]|n[o]
Available for qsub only.
Can be used in conjunction with array jobs (see -t option) to
submit a concurrent array job.
For a concurrent array job either all tasks can be started in one
scheduling run or the whole job will stay pending.
Include this in your job script with
#$ -tcon y

How to submit a job to any [subset] of nodes from nodelist in SLURM?

I have a couple of thousand jobs to run on a SLURM cluster with 16 nodes. These jobs should run only on a subset of the available nodes of size 7. Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded. Therefore, multiple jobs should run at the same time on a single node. None of the tasks should spawn over multiple nodes.
Currently I submit each of the jobs as follow:
sbatch --nodelist=myCluster[10-16] myScript.sh
However this parameter makes slurm to wait till the submitted job terminates, and hence leaves 3 nodes completely unused and, depending on the task (multi- or single-threaded), also the currently active node might be under low load in terms of CPU capability.
What are the best parameters of sbatch that force slurm to run multiple jobs at the same time on the specified nodes?
You can work the other way around; rather than specifying which nodes to use, with the effect that each job is allocated all the 7 nodes, specify which nodes not to use:
sbatch --exclude=myCluster[01-09] myScript.sh
and Slurm will never allocate more than 7 nodes to your jobs. Make sure though that the cluster configuration allows node sharing, and that your myScript.sh contains #SBATCH --ntasks=1 --cpu-per-task=n with n the number of threads of each job.
Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded.
I understand that you want the single-threaded jobs to share a node, whereas the parallel ones should be assigned a whole node exclusively?
multiple jobs should run at the same time on a single node.
As far as my understanding of SLURM goes, this implies that you must define CPU cores as consumable resources (i.e., SelectType=select/cons_res and SelectTypeParameters=CR_Core in slurm.conf)
Then, to constrain parallel jobs to get a whole node you can either use --exclusive option (but note that partition configuration takes precedence: you can't have shared nodes if the partition is configured for exclusive access), or use -N 1 --tasks-per-node="number_of_cores_in_a_node" (e.g., -N 1 --ntasks-per-node=8).
Note that the latter will only work if all nodes have the same number of cores.
None of the tasks should spawn over multiple nodes.
This should be guaranteed by -N 1.
Actually I think the way to go is setting up a 'reservation' first. According to this presentation http://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf (last slide).
Scenario: Reserve ten nodes in the default SLURM partition starting at noon and with a duration of 60 minutes occurring daily. The reservation will be available only to users alan and brenda.
scontrol create reservation user=alan,brenda starttime=noon duration=60 flags=daily nodecnt=10
Reservation created: alan_6
scontrol show res
ReservationName=alan_6 StartTime=2009-02-05T12:00:00
EndTime=2009-02-05T13:00:00 Duration=60 Nodes=sun[000-003,007,010-013,017] NodeCnt=10 Features=(null) PartitionName=pdebug Flags=DAILY Licenses=(null)
Users=alan,brenda Accounts=(null)
# submit job with:
sbatch --reservation=alan_6 myScript.sh
Unfortunately I couldn't test this procedure, probaly due to a lack of privileges.

How many Mapreduce Jobs can be run simultaneously

I want to know How many Mapreduce Jobs can be submit/run simultaneously in a single node hadoop envirnment.Is there any limit?
From a configuration standpoint, there's no limit I'm aware of. You can set the number of map and reduce slots to whatever you want. Practically, though, each slot has to spin up a JVM capable of running some hadoop code, which requires some amount of memory, so eventually you would run out of memory on your machine. You might also have to configure job queues cleverly in order to run a ton at the same time.
Now, what is possible is a very different question than what is a good idea...
You can submit as many jobs you want, they will be queued up and scheduler will run them based on FIFO(by default) and available resources.The number of jobs being executed by hadoop will depend as described by John above.
The number of Reducer slots is set when the cluster is configured. This will limit the number of MapReduce jobs based on the number of Reducers each job requests. Mappers are generally more limited by number of DataNodes and # of processors per node.

increase the number of map and reduce function

I have a question.
I want to increase my map and reduce functions to the number of my input data. when I execute System.out.println(conf.getNumReduceTasks()) and System.out.println(conf.getNumMapTasks()) it shows me:
1 1
and when I execute conf.setNumReduceTasks(1000000) and conf.setNumMapTasks(1000000) and again execute the println method it shows me:
1000000 1000000
but I think there is no change in my mapreduce program execution time. my input is from cassandra, actually it is the cassandra column family rows that is about 362000 rows.
I want to set the number of my map and reduce function to the number of input rows..
what should I do?
Setting the number of map/reduce tasks for your map/reduce job does define how many map/reduce processes will be used to process your job. Consider if you really need so many java processes.
That said, the number of map tasks is mostly determined automatically; setting the number of map tasks is only a hint that can increase the number of maps that were determined by Hadoop.
For reduce tasks, the default is 1 and the practical limit is around 1,000.
See: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
It's also important to understand that each node of your cluster also has a maximum number of map/reduce tasks that can execute concurrently. This is set by the following configuration settings:
mapred.tasktracker.map.tasks.maximum
and
mapred.tasktracker.reduce.tasks.maximum
The default for both of these is 2.
So increasing the number of map/reduce tasks will be limited to the number of tasks that can run simultaneously per node. This may be one reason you aren't seeing a change in execution time for your job.
See: http://hadoop.apache.org/docs/stable/mapred-default.html
The summary is:
Let Hadoop determine the number of maps, unless you want more map tasks.
Use the mapred.tasktracker..tasks.maximum settings to control how many tasks can run at one time.
The max value for number of reduce tasks should be somewhere between 1 or 2 * (mapred.tasktracker.reduce.tasks.maximum * #nodes). You also have to take into account how many map/reduce jobs you expect to run at once, so that a single job doesn't consume all the available reduce slots.
A value of 1,000,000 is almost certainly too high for either setting; it's not practical to run that many java processes. I expect that such high values are simply being ignored.
After setting the mapred.tasktracker..tasks.maximum to the number of tasks your nodes are able to run simultaneously, then try increasing your job's map/reduce tasks incrementally.
You can see the actual number of tasks used by your job in the job.xml file to verify your settings.

Limiting the number of mappers running on Hadoop Streaming

Is it possible to limit the number of mappers running for a job at any given time using Hadoop Streaming? For example, I have a 28 node cluster that can run 1 task per node. If I have a job with 100 tasks, I'd like to only use say 20 out of the 28 nodes at any point in time. I'd like to do limit some jobs because they may contain many long running tasks and I sometimes want to run some faster running jobs and be sure that they can run immediately, rather than wait for the long running job to finish.
I saw this question and the title is spot on but the answers don't seem to address this particular issue.
Thanks!
While i am not aware about "node-wise" capacity scheduling, there is alternative scheduler built for the very similar case: Capacity Scheduler.
http://hadoop.apache.org/common/docs/r0.19.2/capacity_scheduler.html
You should define special queue for potentially long jobs and queue for short jobs and this scheduler will care to have some capacity to be always available for each queue's jobs.
Following option may make sense if the amount of work in each mapper is substantial, since this strategy does involve overhead of reading up to 20 counters in each map invocation.
Create a group of counters and make the groupname MY_TASK_MAPPERS . make the key equal to MAPPER<1..K> where K is the max #of mappers you want. Then in the Mapper iterate through the counters until one of them is found to be 0. Place the machine's un-dotted ip address as a long value in the counter - effectively assigning that one machine to that mapper. If instead all K are already taken, then just quit the mapper without doing anything.

Resources