Are OS threads tied to thread_num in different OpenMP parallel regions? - openmp

If I have multiple parallel regions, would same thread be used for given thread index (returned by omp_get_thread_num()) in those parallel regions?

It is undefined by the OpenMP standard (version 5.1), except for the primary thread (master thread) encountering the two sections (assuming they are encountered by the same thread in your application):
The thread that encountered the parallel construct becomes the primary thread of the new team, with a thread number of zero for the duration of the new parallel region.
Note that the this could not be true if you use untied tasks in your parallel section:
The thread number may change during the execution of an untied task. The value returned by omp_get_thread_num is not generally useful during the execution of such a task region.

Related

Difference between boundedElastic() vs parallel() scheduler

I'm new to Project reactor and trying to understand difference between boundedElastic() vs parallel() scheduler. Documentation says that boundedElastic() is used for blocking tasks and parallel() for non-blocking tasks.
Why do Project reactor need to address blocking scenario as they are non-blocking in nature. Can someone please help me out with some real world use case for boundedElastic() vs parallel() scheduler
?
The parallel flavor is backed by N workers (according to the N cpus) each based on a ScheduledExecutorService. If you submit N long lived tasks to it, no more work can be executed, hence the affinity for short-lived tasks.
The elastic flavor is also backed by workers based on ScheduledExecutorService, except it creates these workers on demand and pools them.
BoundedElastic is same as elastic, difference is that you can limit the total no. of threads.
https://spring.io/blog/2019/12/13/flight-of-the-flux-3-hopping-threads-and-schedulers
TL;DR
Reactor executes non-blocking/async tasks on a small number of threads. In case task is blocking - thread would be blocked and all other tasks would be waiting for it.
parallel should be used for fast non-blocking operation (default option)
boundedElastic should be used to "offload" blocking tasks
In general Reactor API is concurrency-agnostic that use Schedulers abstraction to execute tasks. Schedulers have responsibilities very similar to ExecutorService.
Schedulers.parallel()
Should be a default option and used for fast non-blocking operation on a small number of threads. By default, number of threads is equal to number of CPU cores. It could be controlled by reactor.schedulers.defaultPoolSize system property.
Schedulers.boundedElastic()
Used to execute longer operations (blocking tasks) as a part of the reactive flow. It will use thread pool with a default number of threads number of CPU cores x 10 (could be controlled by reactor.schedulers.defaultBoundedElasticSize) and default queue size of 100000 per thread (reactor.schedulers.defaultBoundedElasticSize).
subscribeOn or publishOn could be used to change the scheduler.
The following code shows how to wrap blocking operation
Mono.fromCallable(() -> {
// blocking operation
}).subscribeOn(Schedulers.boundedElastic()); // run on a separate scheduler because code is blocking
Schedulers.newBoundedElastic()
Similar to Schedulers.boundedElastic() but is useful when you need to create a separate thread pool for some operation.
Sometimes it's not obvious what code is blocking. One very useful tool while testing reactive code is BlockHound
Schedulers provides various Scheduler flavors usable by publishOn or subscribeOn :
1)parallel(): Optimized for fast Runnable non-blocking executions
2)single(): Optimized for low-latency Runnable one-off executions
3)elastic(): Optimized for longer executions, an alternative for blocking tasks where the number of active tasks (and threads) can grow indefinitely
4)boundedElastic(): Optimized for longer executions, an alternative for
fromExecutorService(ExecutorService) to create new instances around Executors
https://projectreactor.io/docs/core/release/api/reactor/core/scheduler/Schedulers.html

How to control how many tasks to run per executor in PySpark [duplicate]

I don't quite understand spark.task.cpus parameter. It seems to me that a “task” corresponds to a “thread” or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.
How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?
I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?
To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.
In more detail:
We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.
You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).

Difference between ThreadCount and StepCount in TIBCO BW Engine

Can anyone explain me the difference between StepCount and ThreadCount property of TIBCO BW Engine . I had tried to understand through TIBCO docs but unable to understand.
So, Please if anyone can explain me this would be great .
Thanks in advance.
The ThreadCount property defines the amount of threads (java threads) which execute all you processes. So with the default value of 8 threads you can run 8 job simultaneously.
The StepCount on the other hand defines the amount of activities executed before a thread can context switch into another job.
Sample scenario:
a process with 5 activities
ThreadCount is 2
StepCount is 4
If there are 3 incoming requests, the first two requests spawn 1 job each. The third job is spawned, but gets paused due to insufficient threads.
After the first job completes the forth activity, the thread is freed and can be assigned to another paused job.
So the first job will be paused and the third job starts to execute.
When the second job reaches the forth activity, this thread will be freed and is available for re-assignement. So the second jobs pauses and the first resumes.
After the third job reaches its forth activity, the thread is freed again and resumes job number one (and completes this one). Afterwards Job number 3 get completed.
All of this is a theoretical scenario. What you usually need is to set the amount of concurrent jobs (so ThreadCount). The StepCount is close to irrelevant, because the engine will take care of the pooling and mapping of physical threads to virtual BW jobs.
ThreadCount
The Thread Count concept states the number of thread a TIBCO BW engine can allocate. The default number of threads is eight.
The number of threads means the number of jobs that can be executed simultaneously in the engine. So the maximum number of jobs that can concurrently in the engine are limited to number threads, that is eight. This property specifies the size of the job thread pool, and is applied to all the AppNodes in the AppSpace if set at the AppSpace level.
Threads carries out a limited number of tasks or activities uninterrupted and then yield to the next job that is ready. Starting with a default value of eight threads the thread count can be adjusted to optimum value and now it can be doubled until CPU maximum level is reached.
StepCount
The StepCount concept states the number of activities that are accomplished by an engine thread, without any disruption, before yielding the engine thread to another job that is ready in the job pool. The default value of the step counter is -1. When the value is set to -1, the engine can decide the required StepCount value. A low StepCount value may humiliate engine performance due to frequent thread exchange depending on the situation.

What is the relation between number of thread and number of processor cores?

I am writing a server application that is thread pool based(IOCP). But I don't know how many threads are appropriate. Is the thread number associated with the number of processor cores?
If your work items never block, use threads = cores. If your threads never need to be descheduled you can max out all cores by creating one thread per core.
If your work items sometimes block (which they shouldn't do much if you want to make best use of IOCP) you need more threads. You need to measure how many.
Multiple threads make up a process, and the number of threads is not dependent on the number of cores. A single core processor can handle a multi-thread process using various scheduling schemes. That said, if you have multiple cores on your processor, you can have different threads run concurrently. So to run multiple threads at the same time, you need multiple cores, but to run multiple threads, but not necessarily simultaneously (can seem simultaneous though), you can use a single core by implementing a scheduling system.
Some useful wiki pages for you:
http://en.wikipedia.org/wiki/Computer_multitasking
http://en.wikipedia.org/wiki/Thread_%28computing%29
http://en.wikipedia.org/wiki/Input/output_completion_port
http://en.wikipedia.org/wiki/Scheduling_%28computing%29
http://en.wikipedia.org/wiki/Thread_pool_pattern

Scheduling Priorities,windows

Based on msdn ,windows os schedule threads based on base prorety and uses as a boost dynamic priorety
The system treats all threads with the same priority as equal. The
system assigns time slices in a round-robin fashion to all threads
with the highest priority. If none of these threads are ready to run,
the system assigns time slices in a round-robin fashion to all threads
with the next highest priority. If a higher-priority thread becomes
available to run, the system ceases to execute the lower-priority
thread (without allowing it to finish using its time slice), and
assigns a full time slice to the higher-priority thread.
From the above quote
The system treats all threads with the same priority as equal
Does it mean that the system treats threads based on dynamic priorety?And base priorety is used just as low limit for dynamic priorety change?
Thank you
Based on msdn ,windows os schedule threads based on base prorety and uses as a boost dynamic
priorety
Well, you follow that with a nice text snipped that has NO SIGN OF A BOOST DYNAMIC PRIORITY.
More information about that is in the documentation - for example http://msdn.microsoft.com/en-us/library/windows/desktop/ms684828(v=vs.85).aspx is a good start.
In simple words, the scheduler schedules threads based on their current priority, and boost priority changes that, so they get scheduled differently.

Resources