What is the recommended approach for writing makefile that has target recipes that spawn multiple processes? Please consider this example.
I have my Makefile as follows:
all: run1 run2 run3
run%:
<invoke script that spawns multiple processes>
Consider that I have a system with 4 cores. Consider that recipe for run1 creates 1 process, run2 creates 2 process, and run3 creates 3 processes. There are some IO intensive processes and some are CPU intensive processes. Also, I would like to enable parallelism with make.
What is the recommended approach of figuring out the number of jobs to use for make file? If I use 4, one of the processes may be starved of CPU. If I use a lesser number of jobs, I may end up not utilizing my CPU efficiently.
Is there an option in Makefile to handle targets that spawn multiple processes? Is there an option in make that suggests the make process to consider a target as 2 jobs rather than 1?
Related
I am a newbie trying to install/administer slurm. I want to limit the amount of forking a slurm job can do. I used stress command to see the CPU utilization by slurm.
When I run this batch script
#SBATCH -p Test -c 1
stress -c 1
The job runs fine with one core used 100 percent. But this script
#SBATCH -p Test -c 1
stress -c 20
also runs but the top command gives list of 20PIDs forked with cpu utilization of 5 percent each. This makes sense as the total utilization is 1 CPU core 100 percent. This makes load averages go crazy which I learned by googling, are not a correct view of system load. I have 2 questions
Is it possible in slurm to limit such a behavior from the admin config by killing the second run. My various attempts have so far yielded nothing. The slurm is configured with cgroup and kills over memory jobs fine. No MPI is used or configured.
Does this behavior cause inefficiency because of process waiting times ?
I tried setting these drastic params to check if something happens.
MaxStepCount=1
MaxTasksPerNode=2
But surprisingly nothing happens and I can submit many more jobs after this.
Slurm's job is to allocate computational resources to user jobs. The lowest unit of computation manageable is referred to in the documentation as the CPU. This refers to processing threads/ execution cores, not physical cores. Slurm does not oversee how those resources are managed by the job. So no, nothing in Slurm can kill a job with too many userland threads.
Running that many threads would probably affect efficiency, yes. All those threads will cause increased context switching unless the job has enough cpu threads to handle them.
MaxStepCount and MaxTasksPerNode are for jobs. "Tasks" in this context are not userland threads but separate processes launched by a job step.
I hope that helps.
I don't quite understand spark.task.cpus parameter. It seems to me that a “task” corresponds to a “thread” or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.
How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?
I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?
To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.
In more detail:
We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.
You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).
In MPI, is it possible to schedule processes in such a way that one core is mapped to one process, the process finishes and then the core starts working on another process in a situation where the number of processes > number of cores? If so, how exactly is it done? As in, using mpiexec --bind-to-core -bycore or using a hostfile, rankfile etc.
Would really appreciate any input.
Assume im trying to run parallel program with 3 different tasks on quad core processor
my question is ,when these tasks run simultaneously ,will they be computed on each core of processor
or in what way they are executed simultaneously?
If you are using c# and parallel lib then yes, they would get queued up in the thread pool and executed in parallel, but there are few other factors that are very important to consider.
Such as:
- Is there is any shared data?
- Does one process need to wait on another?
Also order of execution is not guaranteed.
I am writing a server application that is thread pool based(IOCP). But I don't know how many threads are appropriate. Is the thread number associated with the number of processor cores?
If your work items never block, use threads = cores. If your threads never need to be descheduled you can max out all cores by creating one thread per core.
If your work items sometimes block (which they shouldn't do much if you want to make best use of IOCP) you need more threads. You need to measure how many.
Multiple threads make up a process, and the number of threads is not dependent on the number of cores. A single core processor can handle a multi-thread process using various scheduling schemes. That said, if you have multiple cores on your processor, you can have different threads run concurrently. So to run multiple threads at the same time, you need multiple cores, but to run multiple threads, but not necessarily simultaneously (can seem simultaneous though), you can use a single core by implementing a scheduling system.
Some useful wiki pages for you:
http://en.wikipedia.org/wiki/Computer_multitasking
http://en.wikipedia.org/wiki/Thread_%28computing%29
http://en.wikipedia.org/wiki/Input/output_completion_port
http://en.wikipedia.org/wiki/Scheduling_%28computing%29
http://en.wikipedia.org/wiki/Thread_pool_pattern