I have a corei7 processor with 8 logical processors.
I'm trying to run a parallel task in dotnet core 2.2 using parallel.For.
when I measure start time there are 9 tasks started in parallel.
Isn't it suppose to be just 8?
below you can see :
i => [ThreadId],[ProcessorNumber] == starttime - endtime
Parallel tasks result
You can run however many tasks in parallel that you want, but the processor only has 8 logical cores to process 8 threads simultaneously. The rest will always queue up and wait their turn.
So if you have 16 parallel processes, which each take 200ms to run, then you will run process 1-8 in parallel for 200ms, then 9-16 in parallel for 200ms, totalling at 400ms. If you had 4 logical cores, you would run process 1-4, 5-8, 9-12, 13-16 in parallel, totalling in at 800ms.
Related
In a declarative pipeline parallel block, it is possible to specify multiple stages to execute in parallel on agents with the same node.
Let's say we have 3 stages defined and only two nodes available.
Stage A: 2 hrs
Stage B: 2 hrs
Stage C: 4 hrs
I want to be sure that stage C starts first, because then the total execution time will be 4 hrs. If A and B start first, then total execution will be 6 hours.
Is there any way to ensure Stage C is given priority to start first when all three stages are started using a parallel block?
Here is a hack that should work - run them all in parallel just add sh 'sleep 5m' in A and B stage. Just ensure that you have at least 3 executors on node you run it ;)
But still if you run them all in parallel it should be only 4 hours not 6 hours.
I just create 1000 thread and each thread call Runtime.exec() to start a process.
But when i watching parallel run process by
watch -n 1 'ps -ef|grep "AppName"'
I only found 4 processes run simultaneously at the most.
Most time it only run 2 or 1 process.
Does Runtime.exce() has a limit on process run parallel?
You only get parallelism when you have many processors or different operations going on (e.g. a slow I/O process that is run in a separate thread while the main thread continues).
If you have more threads than cores, all running the same process, all you get is time slicing as the operating system gives each thread some time.
This question already has answers here:
Bash: limit the number of concurrent jobs? [duplicate]
(14 answers)
Closed 6 years ago.
If I run some jobs in parallel like this:
#!/bin/bash
for i in `seq 1 100`;
do
./program data$i.txt &
done
this means that I need 100 cores? Or in case I don't have 100 cores some of the jobs will wait, or they will all be run on that lower number of cores, so more than 1 job will be allocated to a core? And in case I need 100 cores, what should I do to run 10 at a time, without having to make the for loop from 1 to 10 and run the bash file 10 times?
The operating system is responsible for process and thread scheduling.
Or in case I don't have 100 cores some of the jobs will wait
Yes, the jobs will wait. But it probably won't be apparent to you. One job won't wait for another job to finish before it begins. As each job is running, the operating system's scheduling algorithm may interrupt the process executing the job and yield the CPU to another process. See: Scheduling
excerpt :
Process scheduler
The process scheduler is a part of the operating system that decides which process runs at a certain point in time. It usually has the ability to pause a running process, move it to the back of the running queue and start a new process; such a scheduler is known as preemptive scheduler, otherwise it is a cooperative scheduler.
I testing my UDF on Windows virtual machine with 8 cores and 8 GB RAM. I have created 5 files of 2 GB about and run the pig script after modifying "mapred.tasktracker.map.tasks.maximum".
The following runtime and statistics:
mapred.tasktracker.map.tasks.maximum = 2
duration = 20 min 54 sec
mapred.tasktracker.map.tasks.maximum = 4
duration = 13 min 38 sec and about 30 sec for task
35% better
mapred.tasktracker.map.tasks.maximum = 8
duration = 12 min 44 sec and about 1 min for task
only 7% better
Why such a small improvement when changing settings? any ideas? Job was divided into 145 tasks.
![4 slots][1]
![8 slots][2]
Couple of observations:
I imagine your windows machine only has a single disk backing this VM - so there is a limit to how much data you can read off disk at any one time (and write back for the spills). By increasing the task slots, your effectively driving up the read / write demands on your disk (and a more disk thrashing too potentially). If you have multiple disks backing your VM (and not virtual disks all on the same physical disk, i mean virtual disks backed by different physical disks), you would probably see a performance increase over what you've already seen.
By adding more map slots, you've reduced the amount of assignment waves that the Job Tracker needs to do - and each wave has a polling overhead (TT polling the jobs, JT polling the TTs and assigning new tasks to free slots). A 2 slot TT vs 8 slot TT will mean that you have 145/2=~73 assignment waves (if all tasks ran in equal time - obviously not realistic) vs 145/8=~19 waves - thats a ~3x increase in the amount of polling needed to be done (and it all adds up).
mapred.tasktracker.map.tasks.maximum configures the maximum number of map tasks that will be run simultaneously by a task tracker. There is a practical hardware limit to how many tasks a single node can run at a time. So there will be diminishing returns when you keep increasing this number.
For example, say the tasktracker node has 8 cores. Say 4 cores are being used by processes other than the tasktracker. That leaves 4 cores for the mapred tasks. So your task time will improve from mapred.tasktracker.map.tasks.maximum = 1 to 4, but after that, it would just remain static because the other tasks will just be waiting. In fact, if you increase it too much, the contention and context switching might make it slower. The recommended value for this parameter is the No. of CPU cores - 1
I realized I can spin up up to 41 resque workers on my Macbook air. How do I calculate how many I can spin up on an arbitrary machine?
And is spinning up as many workers as possible optimal?
It really depends on what the workers are doing I think.
For example I have a rails app that processes images using RMagick/Resque.
On my Intel Quad Core Q9300 CPU with 4GB of RAM (machine specs do probably matter!) I processed 100 images, several different times, with 1,2 and 5 workers. Here is some of the data:
1 worker, time elapsed 1:35 average processing time per image - 0.89949550366 (seconds)
2 workers, time elapsed 1:13 average processing time per image - 1.41478641043 (seconds)
5 workers, time elapsed 1:13 average processing time per image - 3.22651901574 (seconds)
As you can see, workers do have diminishing returns on them. But it does probably scale based off of the task you are having your workers do.