multi core and parallel processing

multi core and parallel processing - cpu

what is the difference between parallel processing and multi core processing

Parallel and multi-core processing both refer to the same thing: the ability to execute code at the same time (in more than one core/CPU/machine.) So in this sense multi-core is just a means to do parallel processing.
On the other hand, concurrency (which is probably what you mean by parallel processing) refers to having multiple units of execution (threads or processes) that are interleaved. This can also happen in either in a single core CPU or in many cores/CPUs or even in many machines (clusters).
Summing up, multicore is a subset of parallel and concurrency can occur with or without parallelism. The field that studies this is distributed systems or distributed computing.

Parallel processing just refers to a program running more than 1 part simultaneously, usually with the different parts communicating in some way. This might be on multiple cores, multiple threads on one core (which is really simulated parallel processing), multiple CPUs, or even multiple machines.
Multicore processing is usually a subset of parallel processing.
Multicore processing means code working on more than one "core" of a single CPU chip. A core is like a little processor within a processor. So making code work for multicore processing will nearly always be talking about the parallelization aspect (though would also include removing any core specific assumptions, which you shouldn't normally have anyway).
As far as an algorithm design goes, if it is correct in a parallel processing point of view, it will be correct multicore.
However, if you need to optimise your code to get it to run as fast as possible "in parallel" then the differences between multicore, multi-cpu, multi-machine, or vectorised will make a big difference.

Parallel processing can be done inside a single core with multiple threads.
Multi-Core processing means distributing those threads to make use of the multiple cores in a CPU.

Related

Emulate a very fast (virtual) CPU core

I know that the usual method when we want to make a big math computation faster is to use multiprocessing / parallel processing: we split the job in for example 4 parts, and we let 4 CPU cores run in parallel (parallelization). This is possible for example in Python with multiprocessing module: on a 4-core CPU, it would allow to use 100% of the processing power of the computer instead of only 25% for a single-process job.
But let's say we want to make faster a non-easily-splittable computation job.
Example: we are given a number generator function generate(n) that takes the previously-generated number as input, and "it is said to have 10^20 as period". We want to check this assertion with the following pseudo-code:
a = 17
for i = 1..10^20
a = generate(a)
check if a == 17
Instead of having a computer's 4 CPU cores (3.3 Ghz) running "in parallel" with a total of 4 processes, is it possible to emulate one very fast single-core CPU of 13.2 Ghz (4*3.3) running one single process with the previous code?
Is such technique available for a desktop computer? If not, is it available on cloud computing platforms (AWS EC2, etc.)?

Single-threaded performance is extremely valuable; it's much easier to write sequential code than to explicitly expose thread-level parallelism.
If there was an easy and efficient general-purpose way to do what you're asking which works when there is no parallelism in the code, it would already be in widespread use. Either internally inside multi-core CPUs, or in software if it required higher-level / larger-scale code transformations.
Out-of-order CPUs can find and exploit instruction-level parallelism within a single thread (over short distances, like a couple hundred instructions), but you need explicit thread-level parallelism to take advantage of multiple cores.
This is similar to How does a single thread run on multiple cores? over on SoftwareEnginnering.SE, except that you've already ruled out any easy-to-find parallelism including instruction-level parallelism. (And the answer is: it doesn't. It's the hardware of a single core that finds the instruction-level parallelism in a single thread; my answer there explains some of the microarchitectural details of how that works.)
The reverse process: turning one big CPU into multiple weaker CPUs does exist, and is useful for running multiple threads which don't have much instruction-level parallelism. It's called SMT (Simultaneous MultiThreading). You've probably heard of Intel's Hyperthreading, the most widely known implementation of SMT. It trades single-threaded performance for more throughput, keeping more execution units fed with useful work more of the time. The cost of building a single wide core grows at least quadratically, which is why typical desktop CPUs don't just have a single massive core with 8-way SMT. (And note that a really wide CPU still wouldn't help with a totally dependent instruction stream, unless the generate function has some internal instruction-level parallelism.)
SMT would be good if you wanted to test 8 different generate() functions at once on a quad-core CPU. Without SMT, you could alternate in software between two generate chains in one thread, so out-of-order execution could be working on instructions from both dependency chains in parallel.
Auto-parallelization by compilers at compile time is possible for source that has some visible parallelism, but if generate(a) isn't "separable" (not the correct technical term, I think) then you're out of luck.
e.g. if it's return a + hidden_array[static_counter++]; then the compiler can use math to prove that summing chunks of the array in parallel and adding the partial sums will still give the same result.
But if there's truly a serial dependency through a (like even a simple LCG PRNG), and the software doesn't know any mathematical tricks to break the dependency or reduce it to a closed form, you're out of luck. Compilers do know tricks like sum(0..n) = n*(n+1)/2 (evaluated slightly differently to avoid integer overflow in a partial result), or a+a+a+... (n times) is a * n, but that doesn't help here.

There is a scheme studied mostly in the academy called "Thread Decomposition". It aims to do more or less what you ask about - given a single-threaded code, it tries to break it down into multiple threads in order to divide the work on a multicore system. This process can be done by a compiler (although this requires figuring out all possible side effects at compile time which is very hard), by a JIT runtime, or through HW binary-translation, but each of these methods has complicated limitations and drawbacks.
Unfortunately, other than being automated, this process has very little appeal as it can hardly match true manual parallelization done by a person how understands the code. It also doesn't simply scale performance according to the number of threads, since it usually incurs a large overhead in the form of code that has to be duplicated.
Example paper by some nice folks from UPC in Barcelona: http://ieeexplore.ieee.org/abstract/document/5260571/

Can parallel processing be achieved?

Can an MCU really do parallel processing?
Let's just say that I wana countdown, send data through another interface, and do one more work such as Light up an LED all at the sametime.
Is that even possible?

A processor with multiple execution units or cores can perform parallel processing. Most microcontrollers do not have multiple execution units.
Some architectures support SIMD (Single Instruction/Multiple Data) instructions that can generate multiple results from a single instruction - this is a low level form of parallel processing, similarly DSPs (Digital Signal Processors) and microcontrollers with DSP instructions support dual or multiple MAC (multiply/accumulate) units that are also a form of parallel processing. Both SIMD and MAC are used primarily for number crunching and signal processing applications. High end DSPs often support other instruction level parallel execution capabilities.
Another low-level architecture feature that allows parallel execution is pipeline execution. This allows instructions that may take multiple cycles to run to generate one result per cycle by running different stages of the same operation simultaneously.
Most microcontrollers can support a multi-tasking or multi-threading scheduler that can give the impression of concurrent execution by scheduling execution time to each task according to the scheduling algorithm used. While this is not parallel processing and in fact adds an overhead rather than accelerates processing, it is useful in other ways such as functional partitioning of the code and, in the case of a real-time priority based preemptive scheduler, achieving real-time response to events. For the example use case you give in your question, this form of scheduling is entirely appropriate and adequate. See Real-time Operating System (RTOS)
Microcontroller architectures that do support true parallel processing include XMOS, PicoChip, and the Cell processor. Historically the Transputer pioneered parallel processing in microprocessors.
A way of achieving a high level of parallelism at a low level where individual operations of the same process can occur simultaneously (when one does not depend of the result of the other, or a pipeline is used) is to implement a process on an FPGA - essentially to implement the processing in hardware rather than software, but the languages used to program FPGAs share similarities with software languages.

A company named Parallax makes an 8 core MCU called Propeller that does parallel processing. Their programming language "Spin" is interesting, object oriented, scriptish, but also has inline assembly.

Difference between concurrency and simultaneous?

Now I am studying parallel computing and algorithms I am little bit confused about the terms concurrent execution and simultaneous execution.
What is the difference between these terms? When do we have to use concurrent and when do we have to use simultaneous in parallel computing?

Simultaneous execution is about utilizing multiple resources (cores, HW threads, etc..) in order to perform multiple tasks at the same time. The tasks don't have to interact in any way, you may have two different applications running simultaneously on two different cores for example, or on the same core.
The art of designing systems to be able to perform multiple tasks at the same time can be said to deal with simultaneous execution. Hyper-threading for e.g. is also called "SMT", simultaneous multi-threading, since it deals with the ability to run two threads with their full contexts at the same time on a single core (This is Intels' approach, AMD has a slightly different solution, see - Difference between intel and AMD multithreading)
Concurrency is a term residing on a higher level of abstraction, relating to the OS world. It's a property of your execution environment in which you have multiple tasks that may be executed over time, while you have no control over the order or even the form of interleaving in which they're performed. It doesn't really matter if they operate simultaneously on multiple cores, on one core with SMT, or even on a single-threaded core with some preemption mechanism and some scheduling algorithm that breaks the tasks into chunks and constantly swaps between them. The important thing here is that concurrency forces you to design your tasks in a way that guarantees correctness (especially if they interact or share data) on any type of system with any order or interleaving.
If the task is designed correctly (with proper locking, barriers, semaphores, and anything guaranteeing correct data flow) and the OS does its job properly (saving states on context switch for example or clearing caches and shooting down TLB entries when needed), then it can run with any form of execution model "under the hood".
Since you're referring to parallel algorithms, the proper term for you is probably concurrent execution.
There are quite a lot of examples in this thread (with additional links to sources - I won't copy it here to avoid plagiarism :) - What is the difference between concurrency and parallelism?

Comparison between multiprocessing and parallel processing

Can someone tell me the exact difference between multiprocessing and parallel processing? I am little bit confused. Thanks for your help.

Multi Processing
Multiprocessing is the use of two or more central processing units
(CPUs) within a single computer system. The term also refers to the
ability of a system to support more than one processor and/or the
ability to allocate tasks between them.
Parallel Processing
In computers, parallel processing is the processing of program
instructions by dividing them among multiple processors with the
objective of running a program in less time. In the earliest
computers, only one program ran at a time.

Multiprocessing: Running more than one process on a single processor
parallel processing: running a process on more than one processor.

Multiprocessing
A processing technique in which multiple processors or multiple processing cores in a single computer each work on a different job.
Parallel processing
A processing technique in which multiple processors or multiple processing cores in a single computer work together to complete one job more quickly.

Multiprocessing operating systems enable several programs to run concurrently. UNIX is one of the most widely used multiprocessing systems.Multiprocessing means the use of two or more Central Processing Units (CPU) at the same time. Most of new computers have dual-core processors, or feature two or more processors, therefore they are called multiprocessor computers.
Parallel Processing:The simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more engines (CPUs) running it. Most computers have just one CPU, but some models have several. There are even computers with thousands of CPUs. With single-CPU computers, it is possible to perform parallel processing by connecting the computers in a network.

MULTIPROCESSING:simply means using two or more processors within a computer or having two or more cores within a single processor to execute more that more process at simultaneously.
PARALLEL PROCESSING:is the execution of one job by dividing it across different computer/processors.

Multiprocessing is doing a work with the use of many processors or cores.
Parallel processing is dividing one or more work into small parts and give every part a chance to process.

When should I use parallel-programming?

What could be a typical or real problem for using parallel programming? It can be quite challenging to implement. On the internet they explain how to use it but not why.

Performance is the most common reason to use parallel programming. But: Not all programs will become faster by using parallel programming. In most cases your algorithm consists of parts that are parallelizable and parts, that are inherently sequential. You always have to reason about the potential performance gain of using parallel programming. In some cases the overhead for using it will actually make your program slower. Have a look at Amdahl's law to learn more about the potential performance improvements you can reach.
If you only want some examples of usage of parallel computations: There are some classes of algorithms that are inherently parallel, see this article the dwarfs of berkeley

Another reason for using a multithreaded application architecture is it's responsiveness. There are certain functions which block program execution for a certain amount of time, i.e. reads from files, network, waiting for user inputs, etc. While waiting like this does not consume CPU power, it often blocks or slows program flow.
Using threads in such case is simply a good practice to make the code clearer. Instead of using (often complex or unintuitive) checks for inputs, integrating those checks into program flow, manual switching between handling input and other tasks, a programmer may choose to use threads and let one thread wait for input, and the other i.e. to perform calculations.
In other words, multiple threads sometimes allow for better use of different resources at your computer's disposal: network, disk, input devices or simply monitor.
Generalization: using multiple threads (including parallel data processing) is advisable when the speed and responsiveness gains outweigh the synchronization costs and work required to parallelize the application.

The reason why there is increased interest in parallel programming is partly because the hardware we use today is more parallel. (multicore processors, many-core GPU). To fully benefit from this hardware you need to program in parallel.
Interestingly, parallel processing also improves battery life:
Having 4 cores at 1Ghz draws less power than one single core at 4Ghz.
A phone with a multicore CPU will try to run as much tasks as possible simultaneously, so it can turn off the CPU when all work is done. This is sometimes called "the rush to idle".
Now, some programs are more easy parallelize than others. You should not randomly try to parallelize your entire code base. But it can be a useful excersise to do so even if there is no business reason: then you will be more ready the day when you really need it.

There are very few problems which can't be solved more quickly by a parallel program than by a serial program. There are very few computers which do not have multiple processing units.
I conclude, therefore, that you should use parallel programming all the time.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio