If I have 5 processes arrive at the CPU all at different times in a CPU Burst with shortest process next scheduling, would overhead only exist if, lets say process one finishes before the next process arrives? The overhead being the idletime of the CPU?
You should think about the number of cpus. If two processes are using the same cpu, they will compete and slow the overall finish time down. A cpu switching between different processes or threads can slow things down more than idle time. So I would keep it to one process at a time per CPU.
Related
I ran the following Java code on a Linux server:
while (true) {
int a = 1+2;
}
It caused one CPU core to reach 100% usage. I'm confused about this, because I learnt that CPUs deal with tasks by time splitting, meaning that the CPU will do one task in one time slot (CPU time range scheduler). If there are 10 time slots, the while true task should have at most use 10% CPU usage, because the other 90% would be assigned to other tasks. So why is it 100%?
If your CPU usage is not at 100%, then the process can use as much as it wants (up to 100%), until other processes are requesting the use of the resource. The process scheduler attempts to maximize CPU usage, and never leave a process starved of CPU time if there are no other processes that need it.
So your while loop will use 100% of available idle CPU resources, and will only begin to use less when other CPU intensive processes are started up. (if you're using Linux/Unix, you could observe this with top by starting up your while loop, then starting another CPU intensive process and watch the % CPU drop for the process with the loop).
In a multitasking OS the CPU time is split between execution flows (processes, threads) - that's true. So what happens here - the loop is being executed until some point when clock interruption occurs which is used by OS to schedule next execution "piece" from other process or thread. But as soon as your code doesn't reach specific points (input/output or other system calls which can switch the process into the "waiting" state, such as waiting for a synchronization objects sleep operations, etc.) the process keeps being in the "running" state which tells scheduler to keep it in the execution queue and reschedule its execution at the first best opportunity. If there are some "competitive" processes which keep their "running" state for long period of time as well the CPU usage will be shared between your and all these processes, otherwise your process execution will rescheduled immediately which will cause continuous high CPU consumption anyway.
Given two programs that are multiprogrammed in a single processor computer:
Program ONE uses a large amount of CPU time and little I/O.
Program TWO uses a small amount of CPU time but performs a large number of I/O operations.
Which program should get the higher priority for dispatching the CPU? Why?
The second (IO-intensive one) should have highest priority: it has small impact on CPU and while it waits for IO completion the other process, the CPU-intensive, could do its job.
Then, when the IO is completed, the second task gets its chance to treat the IO result and post another IO request, then it is swapped out again in favor of CPU-intensive process.
Shortest Remaining Time First Scheduling should be used.
Otherwise there will be Convoy Effect check "Comparison with CPU-bound" on the mentioned webpage.
There is a convoy effect as the I/O bound process wait for the one big CPU bound process to get off the CPU. This effect results in lower CPU and device utilization than might be possible if the shorter CPU bound process is allowed to go first.
I know this question has been asked many times in many different manners, but it's still not clear for me what the CPU load % means.
I'll start explaining how I perceive the concepts now (of course, I might, and sure will, be wrong):
A CPU core can only execute one instruction at a time. It will not execute the next instruction until it finishes executing the current one.
Suppose your box has one single CPU with one single core. Parallel computing is hence not possible. Your OS's scheduler will pick up a process, set the IP to the entry point, and send that instruction to the CPU. It won't move to the next instruction until the CPU finishes executing the current instruction. After a certain amount of time it will switch to another process, and so on. But it will never switch to another process if the CPU is currently executing an instruction. It will wait until the CPU becomes free to switch to another process. Since you only have one single core, you can't have two processes executing simultaneously.
I/O is expensive. Whenever a process wants to read a file from the disk, it has to wait until the disk accomplishes its task, and the current process can't execute its next instruction until then. The CPU is not doing anything while the disk is working, and so our OS will switch to another process until the disk finishes its job in order not to waste time.
Following these principles, I've come myself to the conclusion that CPU load at a given time can only be one of the following two values:
0% - Idle. CPU is doing nothing at all.
100% - Busy. CPU is currently executing an instruction.
This is obviously false as taskmgr reports %1, 12%, 15%, 50%, etc. CPU usage values.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core (as reported by taskmgr)? While that given process is executing, what happens with the 99%?
What does it mean that the overall CPU usage is 19% (as reported by Rainmeter at the moment)?
If you look into the task manager on Windows there is Idle process, that does exactly that, it just shows amount of cycles not doing anything useful. Yes, CPU is always busy, but it might be just running in a loop waiting for useful things to come.
Since you only have one single core, you can't have two processes
executing simultaneously.
This is not really true. Yes, true parallelism is not possible with single core, but you can create illusion of one with preemptive multitasking. Yes, it is impossible to interrupt instruction, but it is not a problem because most of the instructions require tiny amount of time to finish. OS shares time with time slices, which are significantly longer than execution time of single instruction.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core
Most of the time applications are not doing anything useful. Think of application that waits for user to click a button to start processing something. This app doesn't need CPU, so it sleeps most of the time, or every time it gets time slice it just goes into sleep (see event loop in Windows). GetMessage is blocking, so it means that thread will sleep until message arrives. So what CPU load really means? So imagine the app receives some events or data to do things, it will do operations instead of sleeping. So if it utilizes X% of CPU means that over sampling period of time that app used X% of CPU time. CPU time usage is average metric.
PS: To summarize concept of CPU load, think of speed (in terms of physics). There are instantaneous and average speeds, so speaking of CPU load, there also are instantaneous and average measurements. Instantaneous is always equal to either 0% or 100%, because at some point of time process either uses CPU or not. If process used 100% of CPU in the course of 250ms and didn't use for next 750ms then we can say that process loaded CPU for 25% with sampling period of 1 second (average measurement can only be applied with certain sampling period).
http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.
This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU. Unix refers to this as the run-queue length: the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.
Also see: http://en.wikipedia.org/wiki/Load_(computing)
I am writing a parallel merge sort program. I use fork() to perform the parallel processing. I tried running 2 parallel processes, 4 processes, 8 processes and so on. Then I found that the one running with 2 processes required the least time to finish, i.e the highest performance. I think it's reasonable as my cpu is core 2 duo. For 4,8,16,32 processes, it seems to have a steady declining of performance, but after that the performance fluctuate (doesn't seem to have a pattern). Can someone explain that?
Plus according to the pattern, I have a feeling that when the number of processes used in the program is equal to the number of core that my cpu has, the my program could have the highest performance. But I am 100% sure. Can someone verify me? Or tell me what actually affect the performance of a parallel program.
Thanks in advance!!
With 2 cores any number of processes greater than 2 will have to share the processor time. You will incur overhead from process switching and you will never have more than two processes executing at one time. It is better to have just two processes run uninterrupted on your two cores.
As to why you were seeing a fluctuation in performance once you hit a large number of processes I'd have to make a guess that your OS is spending more time task switching between the processes than actually performing work doing the sort. The time it takes to switch tasks is an artifact of your OS's scheduler, amount of memory being used by individual tasks, caching, potential use of swap space, etc...
If you want to maximize the performance of parallel processes, the number of processes running concurrently should equal the number of processors times the number of cores on each processor. In your case, two. Any less then you have cores sitting idle not doing anything, any more, you have processes sitting idle waiting for time on a processor core.
3 processes should never be faster than 2 processes on a Core 2 Duo.
Also, forking only makes sense if you're doing CPU-expensive tasks:
Forking to print the message Hello world! twice is nonsense. The forking itself will consume more CPU-time than it could possibly save.
Forking to sort an array with 1,000,000 elements (if you use the proper sorting algorithm) will cut execution time roughly in half.
In particular, I'm looking at using TPL to start (and wait for) external processes. Does the TPL look at total machine load (both CPU and I/O) before deciding to start another task (hence -- in my case -- another external process)?
For example:
I've got about 100 media files that need to be encoded or transcoded (e.g. from WAV to FLAC or from FLAC to MP3). The encoding is done by launching an external process (e.g. FLAC.EXE or LAME.EXE). Each file takes about 30 seconds. Each process is mostly CPU-bound, but there's some I/O in there. I've got 4 cores, so the worst case (transcoding by piping the decoder into the encoder) still only uses 2 cores. I'd like to do something like:
Parallel.ForEach(sourceFiles,
sourceFile =>
TranscodeUsingPipedExternalProcesses(sourceFile));
Will this kick off 100 tasks (and hence 200 external processes competing for the CPU)? Or will it see that the CPU's busy and only do 2-3 at a time?
You're going to run into a couple of issues here. The starvation avoidance mechanism of the scheduler will see your tasks as blocked as they wait on processes. It will find it hard to distinguish between a deadlocked thread and one simply waiting for a process to complete. As a result it may schedule new tasks if your tasks run or a long time (see below). The hillclimbing heuristic should take into account the overall load on the system, both from your application and others. It simply tries to maximize work done, so it will add more work until the overall throughput of the system stops increasing and then it will back off. I don't think this will effect your application but the stavation avoidance issue probably will.
You can find more detail as to how this all works in Parallel Programming with Microsoft®.NET, Colin Campbell, Ralph Johnson, Ade Miller, Stephen Toub (an earlier draft is online).
"The .NET thread pool automatically manages the number of worker
threads in the pool. It adds and removes threads according to built-in
heuristics. The .NET thread pool has two main mechanisms for injecting
threads: a starvation-avoidance mechanism that adds worker
threads if it sees no progress being made on queued items and a hillclimbing
heuristic that tries to maximize throughput while using as
few threads as possible.
The goal of starvation avoidance is to prevent deadlock. This kind
of deadlock can occur when a worker thread waits for a synchronization
event that can only be satisfied by a work item that is still pending
in the thread pool’s global or local queues. If there were a fixed
number of worker threads, and all of those threads were similarly
blocked, the system would be unable to ever make further progress.
Adding a new worker thread resolves the problem.
A goal of the hill-climbing heuristic is to improve the utilization
of cores when threads are blocked by I/O or other wait conditions
that stall the processor. By default, the managed thread pool has one
worker thread per core. If one of these worker threads becomes
blocked, there’s a chance that a core might be underutilized, depending
on the computer’s overall workload. The thread injection logic
doesn’t distinguish between a thread that’s blocked and a thread
that’s performing a lengthy, processor-intensive operation. Therefore,
whenever the thread pool’s global or local queues contain pending
work items, active work items that take a long time to run (more than
a half second) can trigger the creation of new thread pool worker
threads.
The .NET thread pool has an opportunity to inject threads every
time a work item completes or at 500 millisecond intervals, whichever
is shorter. The thread pool uses this opportunity to try adding threads
(or taking them away), guided by feedback from previous changes in
the thread count. If adding threads seems to be helping throughput,
the thread pool adds more; otherwise, it reduces the number of
worker threads. This technique is called the hill-climbing heuristic.
Therefore, one reason to keep individual tasks short is to avoid
“starvation detection,” but another reason to keep them short is to
give the thread pool more opportunities to improve throughput by
adjusting the thread count. The shorter the duration of individual
tasks, the more often the thread pool can measure throughput and
adjust the thread count accordingly.
To make this concrete, consider an extreme example. Suppose
that you have a complex financial simulation with 500 processor-intensive
operations, each one of which takes ten minutes on average
to complete. If you create top-level tasks in the global queue for each
of these operations, you will find that after about five minutes the
thread pool will grow to 500 worker threads. The reason is that the
thread pool sees all of the tasks as blocked and begins to add new
threads at the rate of approximately two threads per second.
What’s wrong with 500 worker threads? In principle, nothing, if
you have 500 cores for them to use and vast amounts of system
memory. In fact, this is the long-term vision of parallel computing.
However, if you don’t have that many cores on your computer, you are
in a situation where many threads are competing for time slices. This
situation is known as processor oversubscription. Allowing many
processor-intensive threads to compete for time on a single core adds
context switching overhead that can severely reduce overall system
throughput. Even if you don’t run out of memory, performance in this
situation can be much, much worse than in sequential computation.
(Each context switch takes between 6,000 and 8,000 processor cycles.)
The cost of context switching is not the only source of overhead.
A managed thread in .NET consumes roughly a megabyte of stack
space, whether or not that space is used for currently executing functions.
It takes about 200,000 CPU cycles to create a new thread, and
about 100,000 cycles to retire a thread. These are expensive operations.
As long as your tasks don’t each take minutes, the thread pool’s
hill-climbing algorithm will eventually realize it has too many threads
and cut back on its own accord. However, if you do have tasks that
occupy a worker thread for many seconds or minutes or hours, that
will throw off the thread pool’s heuristics, and at that point you
should consider an alternative.
The first option is to decompose your application into shorter
tasks that complete fast enough for the thread pool to successfully
control the number of threads for optimal throughput.
A second possibility is to implement your own task scheduler
object that does not perform thread injection. If your tasks are of long
duration, you don’t need a highly optimized task scheduler because
the cost of scheduling will be negligible compared to the execution
time of the task. MSDN® developer program has an example of a
simple task scheduler implementation that limits the maximum degree
of concurrency. For more information, see the section, “Further Reading,”
at the end of this chapter.
As a last resort, you can use the SetMaxThreads method to
configure the ThreadPool class with an upper limit for the number
of worker threads, usually equal to the number of cores (this is the
Environment.ProcessorCount property). This upper limit applies for
the entire process, including all AppDomains."
The short answer is: no.
Internally, the TPL uses the standard ThreadPool to schedule its tasks. So you're actually asking whether the ThreadPool takes machine load into account and it doesn't. The only thing that limits the number of tasks simultaneously running is the number of threads in the thread pool, nothing else.
Is it possible to have the external processes report back to your application once they are ready? In that case you do not have to wait for them (keeping threads occupied).
Ran a test using TPL/ThreadPool to schedule a great number of tasks doing looped spins. Using an external app I've loaded one of the cores to 100% using proc affinity. The number of active tasks never decreased.
Even better, I ran multiple instances of the same CPU intensive .NET TPL enabled app. The number of threads for all the apps was the same, and never went below the number of cores, even though my machine was barely usable.
So theory aside, TPL uses the number of cores available, but never checks on their actual load. A very poor implementation in my opinion.