How does linux schedule the multiple child processes? - fork

I'd like to fork thousands of processes that can be run in parallel together to hide IO latency.
My program maintains around 2000 child processes and one waiting thread that waits for child process at any time, but it seems that all processors are not utilized fully.
When I execute the htop, only some processors are fully utilized, but the others are not.
Each process executes a very short function (about 0.1 sec) and exit.
Is it because children exit too early to be measured by the htop for showing CPU utilization?
Because when I make a forked processes to run infinite while loop I could see full utilization on all cores...
I am curious whether the processes are executed in parallel but not measured in htop or they are not executed in parallel..

Related

Why will "while true" use 100% of CPU resources?

I ran the following Java code on a Linux server:
while (true) {
int a = 1+2;
}
It caused one CPU core to reach 100% usage. I'm confused about this, because I learnt that CPUs deal with tasks by time splitting, meaning that the CPU will do one task in one time slot (CPU time range scheduler). If there are 10 time slots, the while true task should have at most use 10% CPU usage, because the other 90% would be assigned to other tasks. So why is it 100%?
If your CPU usage is not at 100%, then the process can use as much as it wants (up to 100%), until other processes are requesting the use of the resource. The process scheduler attempts to maximize CPU usage, and never leave a process starved of CPU time if there are no other processes that need it.
So your while loop will use 100% of available idle CPU resources, and will only begin to use less when other CPU intensive processes are started up. (if you're using Linux/Unix, you could observe this with top by starting up your while loop, then starting another CPU intensive process and watch the % CPU drop for the process with the loop).
In a multitasking OS the CPU time is split between execution flows (processes, threads) - that's true. So what happens here - the loop is being executed until some point when clock interruption occurs which is used by OS to schedule next execution "piece" from other process or thread. But as soon as your code doesn't reach specific points (input/output or other system calls which can switch the process into the "waiting" state, such as waiting for a synchronization objects sleep operations, etc.) the process keeps being in the "running" state which tells scheduler to keep it in the execution queue and reschedule its execution at the first best opportunity. If there are some "competitive" processes which keep their "running" state for long period of time as well the CPU usage will be shared between your and all these processes, otherwise your process execution will rescheduled immediately which will cause continuous high CPU consumption anyway.

When is overhead nonexistent in executing processes?

If I have 5 processes arrive at the CPU all at different times in a CPU Burst with shortest process next scheduling, would overhead only exist if, lets say process one finishes before the next process arrives? The overhead being the idletime of the CPU?
You should think about the number of cpus. If two processes are using the same cpu, they will compete and slow the overall finish time down. A cpu switching between different processes or threads can slow things down more than idle time. So I would keep it to one process at a time per CPU.

Ruby multithreading Performance issues

I am building Ruby application. I have a set of images that I want to greyscale. My code used to be like this:
def Tools.grayscale_all_frames(frames_dir,output_dir)
number_of_frames = get_frames_count(frames_dir)
img_processor = ImageProcessor.new(frames_dir)
create_dir(output_dir)
for i in 1..number_of_frames
img_processor.load_image(frames_dir+"/frame_%04d.png"%+i)
img_processor.greyscale_image
img_processor.save_image_in_dir(output_dir,"frame_%04d"%+i)
end
end
after threading the code:
def Tools.greyscale_all_frames_threaded(frames_dir,output_dir)
number_of_frames = get_frames_count(frames_dir)
img_processor = ImageProcessor.new(frames_dir)
create_dir(output_dir)
greyscale_frames_threads = []
for frame_index in 1..3
greyscale_frames_threads << Thread.new(frame_index) { |frame_number|
puts "Loading Image #{frame_number}"
img_processor.load_image(frames_dir+"/frame_%04d.png"%+frame_number)
img_processor.greyscale_image
img_processor.save_image_in_dir(output_dir,"frame_%04d"%+frame_number)
puts "Greyscaled Image #{frame_number}"
}
end
puts "Starting Threads"
greyscale_frames_threads.each { |thread| thread.join }
end
What I expected is a thread being spawned for each image. I have 1000 images. The resolution is 1920*1080. So how I see things is like this. I have an array of threads that I call .join on it. So join will take all the threads and start them, one after the other? Does that mean that it will wait until thread 1 is done and then start thread 2? What is the point of multithreading then?
What I want is this:
Run all the threads at the same time and not one after the other. So mathematically, it will finish all the 1000 frames in the same time it will take to finish 1 frame, right?
Also can somebody explain me what .join does?
From my understanding .join will stop the main thread until your thread(s) is or are done?
If you don't use .join, then the thread will run the background and the main thread will just continue.
So what is the point of using .join? I want my main thread to continue running and have the other threads in the background doing stuff?
Thanks for any help/clarification!!
This is only true if you have 1000 CPU cores and massive (read: hundreds and hundreds) of RAM.
The point of join is not to start the thread, but to wait until the thread has finished. So calling join on an array of threads is a common pattern for waiting for them all to finish.
Explaining all of this, and clarifying your misconception this requires digging a little deeper. At the C/Assembler level, mst modern OSes (Win, Mac, Linux, and some others) use a preemptive scheduler. If you have only one core, two programs running in paralel is a complete illusion. In reality, the kernel is switching between the two every few milliseconds, giving all of use slow processing humans the illusion of parallel processing.
In newer, more modern CPUs, there are often more than one core. The most powerful CPU's today can go up to (I think) 16 real cores + 16 hyperthreaded cores (see here). This means that you could actually run 32 tasks completely in parallel. But even this does not ensure that if you start 32 threads they will all finish at the same time.
Because of competition for resources that are shared between cores (some cache, all the RAM, harddrive, network card, etc.), and the essentially random nature of preemptive scheduling, the amount of time your thread takes can be estimated in a certain range, but not exactly.
Unfortunatly, all of this breaks down when you get to Ruby. Because of some hariy internal details about the threading model an compatibility, only one thread can execute ruby code at a time. So, if your image processing is done in C, happy joy joy. If it's written in Ruby, well, all the treads in the world arn't going to help you now.
To be able to actually run Ruby code in parallel, you have to use fork. fork is only available on Linux and Mac, and not Windows, but you can think of it as a fork in a road. One process goes in, two processes come out. Multiple processes can run on all your different cores at once.
So, take #Stefan's advice: use a queue and a number of worker threads = to # of CPU cores. And con't expect so much of your computer. Now you know why ;).
So join will take all the threads and start them, one after the other?
No, the threads are started when invoking Thread#new. It creates a new thread and executed the given block within that thread.
join will stop the main thread until your thread(s) is or are done?
Yes, it will suspend execution until the receiver (each of your threads) exists.
So what is the point of using join?
Sometimes you want to start some tasks in parallel but you have to wait for each task to finish before you can continue.
I want my main thread to continue running and have the other threads in the background doing stuff
Then don't call join.
After all it's not a good idea to start 1,000 threads in parallel. Your machine is only capable of running as many tasks in parallel as CPUs are available. So instead of starting 1,000 threads, place your jobs / tasks in a queue / pool and process them using some worker threads (number of CPUs = number of workers).

how would the number of parallel processes affect the performance of CPU?

I am writing a parallel merge sort program. I use fork() to perform the parallel processing. I tried running 2 parallel processes, 4 processes, 8 processes and so on. Then I found that the one running with 2 processes required the least time to finish, i.e the highest performance. I think it's reasonable as my cpu is core 2 duo. For 4,8,16,32 processes, it seems to have a steady declining of performance, but after that the performance fluctuate (doesn't seem to have a pattern). Can someone explain that?
Plus according to the pattern, I have a feeling that when the number of processes used in the program is equal to the number of core that my cpu has, the my program could have the highest performance. But I am 100% sure. Can someone verify me? Or tell me what actually affect the performance of a parallel program.
Thanks in advance!!
With 2 cores any number of processes greater than 2 will have to share the processor time. You will incur overhead from process switching and you will never have more than two processes executing at one time. It is better to have just two processes run uninterrupted on your two cores.
As to why you were seeing a fluctuation in performance once you hit a large number of processes I'd have to make a guess that your OS is spending more time task switching between the processes than actually performing work doing the sort. The time it takes to switch tasks is an artifact of your OS's scheduler, amount of memory being used by individual tasks, caching, potential use of swap space, etc...
If you want to maximize the performance of parallel processes, the number of processes running concurrently should equal the number of processors times the number of cores on each processor. In your case, two. Any less then you have cores sitting idle not doing anything, any more, you have processes sitting idle waiting for time on a processor core.
3 processes should never be faster than 2 processes on a Core 2 Duo.
Also, forking only makes sense if you're doing CPU-expensive tasks:
Forking to print the message Hello world! twice is nonsense. The forking itself will consume more CPU-time than it could possibly save.
Forking to sort an array with 1,000,000 elements (if you use the proper sorting algorithm) will cut execution time roughly in half.

Is it advantageous to use threads in windows?

Some of the fellows in the office think that when they've added threads to their code that windows will assign these threads to run on different processors of a multi-core or multi-processor machine. Then when this doesn't happen everything gets blamed on the existence of these threads colliding with one another on said multi-core or multi-processor machine.
Could someone debunk or confirm this notion?
When an application spawns multiple threads, it is indeed possible for them to get assigned to different processors. In fact, it is not uncommon for incorrect multi-threaded code to run ok on a single-processor machine but then display problems on a multi-processor machine. (This happens if the code is safe in the face of time slicing but broken in the face of true concurrency.)
You generally can only optimally have 1 thread per CPU, but unless your application has some explicit thread affinity to one processor then yes Windows will assign these threads to a free processor.
Windows will automatically execute multiple threads on different processors if the machine has multiple processors. If you are running on a single processor machine, the threads are time-sliced but when you move the process to a multiple processor machine, the process will automatically take advantage of the multiple processors.
Because the code is running simultaneously, the threads may be more likely to step on each others toes on a multi-core machine then on a single core machine since both threads could be writing to a shared location at the same time instead of this happening if the thread swap is timed just right.
Yes, Threads and multi-Threading has almost nothing to do with the number of cpus or cores in a machine...
EDIT ADDITION: To talk about "how many threads run on a cpu" is an oxymoron. Only one thread can ever run on a single CPU. Multi-Threading is about multiple threads in a PROCESS, not on a CPU. Before another thread can be run on any CPU, the thread currently on that CPU has to STOP running, and it's state must be preserved somewhere so that the OS can restart it when it get's it's next "turn".
Code runs in "Processes" which are logical abstractions that can run one or more sequences of code instructions, and manage computer resources independantly from other processes. Within a process, each separate sequence of code instructions is a "thread". Which cpu they run on is irrelevant. A single thread can run on a differnt cpu each time is is allocated a cpu to run on... and multiple threads, as they are each allocated cpu cycles, may, by coincidence, run on the same cpu (although obviously not simultaneously)
The OS (a component of the OS) is responsible for "running" threads. It keeps an in-memory list of all threads, and constantly "switches" (it's called a context switch) among them. It does this in a single CPU machine in almost exactly the same way as it does ion in a multiple-cpu machine. Even in a multiple Cpu machine, each time it "turns on" a thread, it might give it to a different CPU, or to the same cpu, as it did the last time.
There is no guarantee that threads of your process will be assigned to run on different CPUs.

Resources