This question already has answers here:
Bash: limit the number of concurrent jobs? [duplicate]
(14 answers)
Closed 6 years ago.
If I run some jobs in parallel like this:
#!/bin/bash
for i in `seq 1 100`;
do
./program data$i.txt &
done
this means that I need 100 cores? Or in case I don't have 100 cores some of the jobs will wait, or they will all be run on that lower number of cores, so more than 1 job will be allocated to a core? And in case I need 100 cores, what should I do to run 10 at a time, without having to make the for loop from 1 to 10 and run the bash file 10 times?
The operating system is responsible for process and thread scheduling.
Or in case I don't have 100 cores some of the jobs will wait
Yes, the jobs will wait. But it probably won't be apparent to you. One job won't wait for another job to finish before it begins. As each job is running, the operating system's scheduling algorithm may interrupt the process executing the job and yield the CPU to another process. See: Scheduling
excerpt :
Process scheduler
The process scheduler is a part of the operating system that decides which process runs at a certain point in time. It usually has the ability to pause a running process, move it to the back of the running queue and start a new process; such a scheduler is known as preemptive scheduler, otherwise it is a cooperative scheduler.
Related
I am a newbie trying to install/administer slurm. I want to limit the amount of forking a slurm job can do. I used stress command to see the CPU utilization by slurm.
When I run this batch script
#SBATCH -p Test -c 1
stress -c 1
The job runs fine with one core used 100 percent. But this script
#SBATCH -p Test -c 1
stress -c 20
also runs but the top command gives list of 20PIDs forked with cpu utilization of 5 percent each. This makes sense as the total utilization is 1 CPU core 100 percent. This makes load averages go crazy which I learned by googling, are not a correct view of system load. I have 2 questions
Is it possible in slurm to limit such a behavior from the admin config by killing the second run. My various attempts have so far yielded nothing. The slurm is configured with cgroup and kills over memory jobs fine. No MPI is used or configured.
Does this behavior cause inefficiency because of process waiting times ?
I tried setting these drastic params to check if something happens.
MaxStepCount=1
MaxTasksPerNode=2
But surprisingly nothing happens and I can submit many more jobs after this.
Slurm's job is to allocate computational resources to user jobs. The lowest unit of computation manageable is referred to in the documentation as the CPU. This refers to processing threads/ execution cores, not physical cores. Slurm does not oversee how those resources are managed by the job. So no, nothing in Slurm can kill a job with too many userland threads.
Running that many threads would probably affect efficiency, yes. All those threads will cause increased context switching unless the job has enough cpu threads to handle them.
MaxStepCount and MaxTasksPerNode are for jobs. "Tasks" in this context are not userland threads but separate processes launched by a job step.
I hope that helps.
My question is based on THIS question.
I should consider using --array=0-60000%200 to limit the number of jobs running in parallel to 200 in slurm. It seems to me that it takes up to a minute to lunch a new job every time that an old job is finished. Given the number of jobs that I am planning to run, I might be wasting a lot of time this way.
I wrote a "most probably" very inefficient alternative, consisting in a script that launches the jobs, checking the number of jobs in the queue and adding jobs if I am still bellow the max number of jobs allowed and while I reached the max number of parallel jobs, sleep for 5 seconds, as follows:
#!/bin/bash
# iterate procedure $1 times. $1=60000
for ((i=0;i<=$1;i++))
do
# wait until any queued process is finished
q=$(squeue -u myuserName | wc -l) #I don't care about +/-1 lines (e.g. title)
while [ $q -gt 200 ] #max number of parallel jobs set to 200
do
sleep 5
q=$(squeue -u myuserName | wc -l)
done
# run the job with sbatch
sbatch...
done
It seems to do a better job compared to my previous method, nevertheless,
I would like to know how inefficient is in reality this implementation? and why?
Could I be harming the scheduling efficiency of other users on the same cluster?
Thank you.
SLURM needs some time to process the jobs list and decide which job should be the next to run, specially if the backfill scheduler is in place and there are lots of jobs in the queue. You are not losing one minute to schedule a job due to you using a job array, is SLURM that needs one minute to decide, and it will need the same minute for any other job of any other user, with or without job arrays.
By using your approach your jobs are also losing priority: everytime one of your jobs finishes, you launch a new one, and that new job will be the last in the queue. Also, SLURM will have to manage some hundreds of independent jobs instead of only one that accounts for the 60000 that you need.
If you are alone in the cluster, maybe there's no big difference in both approaches, but if your cluster is full, you manual approach will give a slightly higher load to SLURM and you jobs will finish quite a lot later compared to the job array approximation (just because with the job array, once the array gets to be first in line, the 60000 are first in line, compared to being last in line everytime one of your jobs finishes).
I have a corei7 processor with 8 logical processors.
I'm trying to run a parallel task in dotnet core 2.2 using parallel.For.
when I measure start time there are 9 tasks started in parallel.
Isn't it suppose to be just 8?
below you can see :
i => [ThreadId],[ProcessorNumber] == starttime - endtime
Parallel tasks result
You can run however many tasks in parallel that you want, but the processor only has 8 logical cores to process 8 threads simultaneously. The rest will always queue up and wait their turn.
So if you have 16 parallel processes, which each take 200ms to run, then you will run process 1-8 in parallel for 200ms, then 9-16 in parallel for 200ms, totalling at 400ms. If you had 4 logical cores, you would run process 1-4, 5-8, 9-12, 13-16 in parallel, totalling in at 800ms.
I don't quite understand spark.task.cpus parameter. It seems to me that a “task” corresponds to a “thread” or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.
How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?
I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?
To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.
In more detail:
We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.
You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).
I just create 1000 thread and each thread call Runtime.exec() to start a process.
But when i watching parallel run process by
watch -n 1 'ps -ef|grep "AppName"'
I only found 4 processes run simultaneously at the most.
Most time it only run 2 or 1 process.
Does Runtime.exce() has a limit on process run parallel?
You only get parallelism when you have many processors or different operations going on (e.g. a slow I/O process that is run in a separate thread while the main thread continues).
If you have more threads than cores, all running the same process, all you get is time slicing as the operating system gives each thread some time.