slurm limit number of spwaned processes - fork

I am a newbie trying to install/administer slurm. I want to limit the amount of forking a slurm job can do. I used stress command to see the CPU utilization by slurm.
When I run this batch script
#SBATCH -p Test -c 1
stress -c 1
The job runs fine with one core used 100 percent. But this script
#SBATCH -p Test -c 1
stress -c 20
also runs but the top command gives list of 20PIDs forked with cpu utilization of 5 percent each. This makes sense as the total utilization is 1 CPU core 100 percent. This makes load averages go crazy which I learned by googling, are not a correct view of system load. I have 2 questions
Is it possible in slurm to limit such a behavior from the admin config by killing the second run. My various attempts have so far yielded nothing. The slurm is configured with cgroup and kills over memory jobs fine. No MPI is used or configured.
Does this behavior cause inefficiency because of process waiting times ?
I tried setting these drastic params to check if something happens.
MaxStepCount=1
MaxTasksPerNode=2
But surprisingly nothing happens and I can submit many more jobs after this.

Slurm's job is to allocate computational resources to user jobs. The lowest unit of computation manageable is referred to in the documentation as the CPU. This refers to processing threads/ execution cores, not physical cores. Slurm does not oversee how those resources are managed by the job. So no, nothing in Slurm can kill a job with too many userland threads.
Running that many threads would probably affect efficiency, yes. All those threads will cause increased context switching unless the job has enough cpu threads to handle them.
MaxStepCount and MaxTasksPerNode are for jobs. "Tasks" in this context are not userland threads but separate processes launched by a job step.
I hope that helps.

Related

How to control how many tasks to run per executor in PySpark [duplicate]

I don't quite understand spark.task.cpus parameter. It seems to me that a “task” corresponds to a “thread” or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.
How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?
I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?
To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.
In more detail:
We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.
You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).

Puma clustering benefits for site which handles lots of uploads/downloads

I'm trying to understand the benefits of using puma clustering. The GitHub says that the number puma workers should be set to the number of cpu cores, and the default number of threads for each is 0-16. The worker processes can run in parallel while the threads run concurrently. It was my understanding that The MRI GIL only allows one thread across all cores to run Ruby code, so how does puma enable things to run in parallel /provide benefits over running one worker process with double the amount of threads? The site I'm working on is heavily IO bound, handling several uploads and downloads at the same time - any config suggestions for this set up are also welcome.
The workers in clustered mode will actually spawn new child processes each of which has its own "GIL". Only one thread in a single process can be running code at one time, thus having a process per cpu core works well because each cpu can only be doing one thing at a time. It also makes sense to run multiple threads per process because if a thread is waiting for IO it will allow another thread to execute.

Detecting CPU load on different machines

I am trying to create a background task scheduler for my process, which needs to schedule the tasks(compute intensive) parallelly while maintaining the responsiveness of the UI.
Currently, I am using CPU usage(percentage) to against a threshold (~50%) for the scheduler to start a new task, and it sort of works fine.
This program can run on a variety hardware configurations( e.g processor speed, number of cores), so 50% limit can be too harsh or soft for certain configurations.
Is there any good way to include different parameters of CPU configuration e.g cores, speed; which can dynamically come up with a threshold number based on the hardware configuration?
My suggestions:
Run as many threads as CPUs in the system.
Set the priority of each thread to an idle (lowest)
In the thread main loop do a smallest sleep possible, i.e. usleep(1)

How many processes can be run simultaneously which called by java Runtime.exec()?

I just create 1000 thread and each thread call Runtime.exec() to start a process.
But when i watching parallel run process by
watch -n 1 'ps -ef|grep "AppName"'
I only found 4 processes run simultaneously at the most.
Most time it only run 2 or 1 process.
Does Runtime.exce() has a limit on process run parallel?
You only get parallelism when you have many processors or different operations going on (e.g. a slow I/O process that is run in a separate thread while the main thread continues).
If you have more threads than cores, all running the same process, all you get is time slicing as the operating system gives each thread some time.

Individual Spark Task consume more time on computation if more cores are assigned

I am running a spark job with input file of size 6.6G (hdfs) with master as local. My Spark Job with 53 partitions completed quickly when I assign local[6] than local[2], however the individual task takes more computation time when number of cores are more. Say if I assign 1 core(local[1]) then each task takes 3 secs where the same goes up to 12 seconds if I assign 6 cores (local[6]). Where the time gets wasted? The spark UI shows increase in computation time for each task in local[6] case, I couldn't understand the reason why the same code takes different computation time when more cores are assigned.
Update:
I could see more %iowait in iostat output if I use local[6] than local[1]. Please let me know this is the only reason or any possible reasons. I wonder why this iowait is not reported in sparkUI. I see the increase in computing time than iowait time.
I am assuming you are referring to spark.task.cpus and not spark.cores.max
With spark.tasks.cpus each task get assigned more cores, but it doesn't necessarily have to use them. If you process is single threaded it really can't use them. You wind up with additional overhead without additional benefit and those cores are taken away from other single threaded tasks that can use them.
With spark.cores.max it is simply and overhead issue with transferring data around at the same time.

Resources