reallocate resources to a reduced task HPC 2012 - allocation

I have 2 jobs, submitted under 2 templates in order to use priority levels.
I'm using Queued Scheduling, with Graceful Pre-emption and all the automatic resource adjustment enabled.
I submit the low priority job. At a later stage I allocate the higher priority job. HPC 2012 automatically takes resources from the lower priority task as the running tasks within the job complete.
HPC then does NOT reallocate those resources back the lower priority task when the higher priority one completes.
This gets even worse as the application submitting the tasks/jobs adds further tasks to running jobs as certain tasks complete, and I typically have only a handful of cores allocated to the low priority one, despite there being several hundred cores free once the high priority job completes.
Is there a way to change HPC configuration to do this?

OK, so the HPC Pack 2012 SP1 notes state that this exact issue has been fixed in SP1 for SOA mode. To clarify, it HAS also been fixed for Batch mode.
This is how it works (when using Batch mode, queued scheduling with graceful pre-emption)
Submit a low priority job
Scheduler allocates resources to the tasks under the low job.
Submit a high job
As tasks under the low priority job complete the resources are allocated to the tasks under the high priority.
Under HPC Pack 2012, the resources on the low job can be run right down to a single core and the job would carry on trying to finish with a single core allocated (unless you also put in conditions on minimum numbers allocated).
Under HPC Pack 2012 SP1, it will allocate resources to the high job more aggressively (it will take ALL the resources from the low job as the already-running tasks finish) and label the tasks as cancelled in the Activity Log.
As the resources free up as the high priority job's tasks complete these are re-allocated to the low priority job, which now just appears as a queued task, so normal task-level prioritisation/scheduling occurs, exactly as if this low task had been submitted separately. It therefore gets full resources re-allocated.

Related

Does the Windows scheduler sometimes fail to preempt a running thread immediately to let a higher-priority thread run?

My application operates on pairs of long vectors - say it adds them together to produce a vector result. Its rules state that it must completely finish with one pair before it can be given another. I would like to use multiple threads to speed things up. I am running Windows 10.
I created an OpenMP parallel for construct and divided the vector among all the threads of the team. All threads start, all threads run pretty fast, so the multithreading is effective.
But the speedup is slight, and the reason is that some of the time, one of the worker threads takes way longer than usual. I have instrumented the operation, and I see that sometimes the worker threads take a long time to start - delay varies from 20 microseconds on average to dozens of milliseconds depending on system load. The master thread does not show this delay.
That makes me think that the scheduler is taking some time to start the worker threads. The master thread is already running, so it doesn't have to wait to be started.
But here is the nub of the question: raising the priority of the process doesn't make any difference. I can raise it to high priority or even realtime priority, and I still see that startup of the worker threads is often delayed. It looks like the Windows scheduler is not fully preemptive, and sometimes lets a lower-priority thread run when a higher-priority one is eligible. Can anyone confirm this?
I have verified that the worker threads are created with the default OS priority, namely the base priority of the class of the master process. This should be higher that the priority of any running thread, I think. Or is it normal for there to be some thread with realtime priority that might be blocking my workers? I don't see one with Task Manager.
I guess one last possibility is that the task switch might take 20 usec. Is that plausible?
I have a 4-core system without hyperthreading.

How to control how many tasks to run per executor in PySpark [duplicate]

I don't quite understand spark.task.cpus parameter. It seems to me that a “task” corresponds to a “thread” or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.
How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?
I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?
To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.
In more detail:
We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.
You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).

Detecting CPU load on different machines

I am trying to create a background task scheduler for my process, which needs to schedule the tasks(compute intensive) parallelly while maintaining the responsiveness of the UI.
Currently, I am using CPU usage(percentage) to against a threshold (~50%) for the scheduler to start a new task, and it sort of works fine.
This program can run on a variety hardware configurations( e.g processor speed, number of cores), so 50% limit can be too harsh or soft for certain configurations.
Is there any good way to include different parameters of CPU configuration e.g cores, speed; which can dynamically come up with a threshold number based on the hardware configuration?
My suggestions:
Run as many threads as CPUs in the system.
Set the priority of each thread to an idle (lowest)
In the thread main loop do a smallest sleep possible, i.e. usleep(1)

Possible to parallellize SonarQube background tasks?

In SonarQube (5.6.4 LTS) there is a view where background (project analysis) tasks are visualized: (Administration / Projects / Background Tasks). It seems like the tasks are run in sequence (one at a time). Some tasks could take 40 minutes which means other projects are queued up waiting for this task to finish before they could be started.
Is it possible to configure the SonarQube Compute Engine so that these tasks are run in parallel instead?
As per documentation on Background Tasks:
You can control the number of Analysis Reports that can be processed at a time in $SQ_HOME/conf/sonar.properties (see sonar.ce.workerCount - Default is 1).
Careful though: blindly increasing sonar.ce.workerCount without proper monitoring is just like shooting in the dark. The underlying resources available (CPU/RAM) are fixed (all workers run in the Compute Engine JVM), and you don't want to end-up with very limited memory for each task and/or high CPU-switching. That would kill performance for each of the tasks, rather than having only a few in parallel which will be much more efficient.
In short: better to have maximum 2 tasks in parallel that can complete under a minute (i.e. max 10 minutes to run 20 tasks), rather than 20 sluggish tasks in parallel that will overall take 15 minutes to complete because they struggle to share common CPU/RAM.
Update: with SonarQube 6.7+ and the new licence plans, "parallel processing of reports" has become a commercial feature and is only available in the Enterprise Edition.

Hadoop Fairschduler doesn't utilize all map slots

Running a 12-node hadoop cluster with total 48 map-slots available. Submitting bunch of jobs, but never see all map slots being utilized. Maximum number of busy slots is floating around 30-35, but never close to 48. Why?
Here's the configuration of fairscheduler.
<?xml version="1.0"?>
<allocations>
<pool name="big">
<minMaps>10</minMaps>
<minReduces>10</minReduces>
<maxRunningJobs>3</maxRunningJobs>
</pool>
<pool name="medium">
<minMaps>10</minMaps>
<minReduces>10</minReduces>
<maxRunningJobs>3</maxRunningJobs>
<weight>3.0</weight>
</pool>
<pool name="small">
<minMaps>20</minMaps>
<minReduces>20</minReduces>
<maxRunningJobs>20</maxRunningJobs>
<weight>100.0</weight>
</pool>
</allocations>
The idea is that jobs in small queue should always have a priority, the next important queue is 'medium' and the less important is 'big'. Sometimes I see jobs in medium or big queue starve although there are more map slots available that are not used.
I think that the issue can be caused because the maxRunningJobs option is not taken into account while computing shares for jobs. I think that parameter is handled after slots (from the exceeding job) has been already assigned to a tasktracker. That is happening every n seconds from the UpdateThread.update()-> update Runability() method from FairScheduler class. I suppose that in your case after some time jobs from “medium” and “big” pool gets a bigger deficit than jobs from the “small” pool, that means that the next task will be scheduled from the job in medium or big pool. When the task is scheduled the restriction of maxRunningJobs take place and puts the exceeding jobs into a non runnable state. The same thing appears on the following update.
This is just my guess after looking after some source of fscheduler. If you can I would probably try to remove maxRunningJobs from the config and see how the scheduler behaves without that limitation and if it takes all of your slots..
Weigths for the pools in my oppinion seems to be to high. Weigh of 100 would mean that this pool should get 100x more slots than the default pool. I would try to lower this number by few factors if you want to have fair sharing between your pools. Otherwise jobs from others pools will be launched just when they will meet their deficit (it is calculated from the running tasks and minShare)
Another option why jobs are starving is maybe because of delay scheduling that is included in the fsched with the aim of improving computation locality? This can be probably improved by increasing a repclication factor but I do not think this is your case..
some docs on the fairscheduler..
The starvation probably occurs because the priority of the small pool is really really high (2^100 more than big 2^97 more than medium). When all the jobs are are ordered by priority and you have waiting jobs in the small pool. The next job in that pool needs 20 slots and it has higher priority than anything else so the open slots just wait there until a currently running job will free them. there are no "unneeded slots" to divide to other priorities
see highlights from the implementation notes of the fair schedulere:
"The fair shares are calculated by dividing the capacity of the
cluster among runnable jobs according to a "weight" for each job. By
default the weight is based on priority, with each level of priority
having 2x higher weight than the next (for example, VERY_HIGH has 4x
the weight of NORMAL). However, weights can also be based on job sizes
and ages, as described in the Configuring section. For jobs that are
in a pool, fair shares also take into account the minimum guarantee
for that pool. This capacity is divided among the jobs in that pool
according again to their weights."
Finally, when limits on a user's running jobs or a pool's running jobs
are in place, we choose which jobs get to run by sorting all jobs in
order of priority and then submit time, as in the standard Hadoop
scheduler. Any jobs that fall after the user/pool's limit in this
ordering are queued up and wait idle until they can be run. During
this time, they are ignored from the fair sharing calculations and do
not gain or lose deficit (their fair share is set to zero).

Resources