Matlab parallel computation - parallel-processing

Is it mandatory to have parallel computing toolbox to use all the cores of my CPU?
Is it possible to do the make a computation to run on all the cores of one single CPU without any toolbox?

Related

Is it possible to mix CPU and GPU computations with ArrayFire?

I have to solve a mathematical problem which can be either dense or sparse for different time values. I would like to use the GPU for the first and the CPU for the second (assuming the switch does not occur too often to make the process inefficient).
AFAIK, ArrayFire can be configured to use both the CPU and the GPU, but this has limitations, e.g. I cannot define a matrix to be used in both configurations.
Is there a way to achieve this approach?
Can a matrix be shared among CPU and GPU, and be processed by one of them at a time?
Can two matrices, with different characteristics, be processed one by the CPU and one by the GPU?

Measuring Efficiency of GPU Program

I have both serial and parallel program using GPU.
The serial program takes 112.9 seconds to finish.
The parallel program with GPU takes 3.16 second to finish.
Thus, I have the speedup of 35.73.
Can I measure the efficiency of the program using the formula SpeedupTime/NumberOfThread ?
The threads will be 1024
The efficiency is the ratio of the time on the CPU to time on GPU. You might want to try multicore implementation as well and compare it with GPU implementation.

Estimate parallel efficiency using unicore processor

We know that the parallel efficiency of a program running on a multicore system can be calculated as speedup/N where N is the number of cores. So in order to use this formula first we need to execute the code on a multicore system and need to know the speedup.
I would like to know if I don't have a multicore system,then is it possible to estimate the speedup of the given code on a multicore system just by executing it on an unicore processor?
I have access to performance counters ( Instruction per cycles , number of cache misses , Number of Instructions etc) and I only have binaries of the code.
[Note: I estimated the parallel_running_time (T_P) = serial_running_time/N but this estimation has unacceptable error]
Thanks
Read up on Amdahl's Law, especially the bit about parallelization.
For you to determine how much you can speed up your program, you have to know what parts of the program can benefit from parallelization and what parts must be executed sequentially. If you know that, and if you know how long the serial and the parallel parts take (individually) on a single processor, then you can estimate how fast the program will be on multiple processors.
From your description, it seems that you don't know which parts can make use of parallel processing and which parts have to be executed sequentially. So it won't be possible to estimate the parallel running time.

Comparison between multiprocessing and parallel processing

Can someone tell me the exact difference between multiprocessing and parallel processing? I am little bit confused. Thanks for your help.
Multi Processing
Multiprocessing is the use of two or more central processing units
(CPUs) within a single computer system. The term also refers to the
ability of a system to support more than one processor and/or the
ability to allocate tasks between them.
Parallel Processing
In computers, parallel processing is the processing of program
instructions by dividing them among multiple processors with the
objective of running a program in less time. In the earliest
computers, only one program ran at a time.
Multiprocessing: Running more than one process on a single processor
parallel processing: running a process on more than one processor.
Multiprocessing
A processing technique in which multiple processors or multiple processing cores in a single computer each work on a different job.
Parallel processing
A processing technique in which multiple processors or multiple processing cores in a single computer work together to complete one job more quickly.
Multiprocessing operating systems enable several programs to run concurrently. UNIX is one of the most widely used multiprocessing systems.Multiprocessing means the use of two or more Central Processing Units (CPU) at the same time. Most of new computers have dual-core processors, or feature two or more processors, therefore they are called multiprocessor computers.
Parallel Processing:The simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more engines (CPUs) running it. Most computers have just one CPU, but some models have several. There are even computers with thousands of CPUs. With single-CPU computers, it is possible to perform parallel processing by connecting the computers in a network.
MULTIPROCESSING:simply means using two or more processors within a computer or having two or more cores within a single processor to execute more that more process at simultaneously.
PARALLEL PROCESSING:is the execution of one job by dividing it across different computer/processors.
Multiprocessing is doing a work with the use of many processors or cores.
Parallel processing is dividing one or more work into small parts and give every part a chance to process.

MATLAB Parallel Computing Toolbox - Parallelization vs GPU?

I'm working with someone who has some MATLAB code that they want to be sped up. They are currently trying to convert all of this code into CUDA to get it to run on a CPU. I think it would be faster to use MATLAB's parallel computing toolbox to speed this up, and run it on a cluster that has MATLAB's Distributed Computing Toolbox, allowing me to run this across several different worker nodes. Now, as part of the parallel computing toolbox, you can use things like GPUArray. However, I'm confused as to how this would work. Are using things like parfor (parallelization) and gpuarray (gpu programming) compatible with each other? Can I use both? Can something be split across different worker nodes (parallelization) while also making use of whatever GPUs are available on each worker?
They think its still worth exploring the time it takes to convert all of your matlab code to cuda code to run on a machine with multiple GPUs...but I think the right approach would be to use the features already built into MATLAB.
Any help, advice, direction would be really appreciated!
Thanks!
When you use parfor, you are effectively dividing your for loop into tasks, with one task per loop iteration, and splitting up those tasks to be computed in parallel by several workers where each worker can be thought of as a MATLAB session without an interactive GUI. You configure your cluster to run a specified number of workers on each node of the cluster (generally, you would choose to run a number of workers equal to the number of available processor cores on that node).
On the other hand, gpuarray indicates to MATLAB that you want to make a matrix available for processing by the GPU. Underneath the hood, MATLAB is marshalling the data from main memory to the graphics board's internal memory. Certain MATLAB functions (there's a list of them in the documentation) can operate on gpuarrays and the computation happens on the GPU.
The key differences between the two techniques are that parfor computations happen on the CPUs of nodes of the cluster with direct access to main memory. CPU cores typically have a high clock rate, but there are typically fewer of them in a CPU cluster than there are GPU cores. Individually, GPU cores are slower than a typical CPU core and their use requires that data be transferred from main memory to video memory and back again, but there are many more of them in a cluster. As far as I know, hybrid approaches are supposed to be possible, in which you have a cluster of PCs and each PC has one or more Nvidia Tesla boards and you use both parfor loops and gpuarrays. However, I haven't had occasion to try this yet.
If you are mainly interested in simulations, GPU processing is the perfect choice. However, if you want to analyse (big) data, go with Parallization. The reason for this is, that GPU processing is only faster than cpu processing if you don't have to copy data back and forth. In case of a simulation, you can generate most of the data on the GPU and only need to copy the result back. If you try to work with bigger data on the GPU you will very often run into out of memory problems.
Parallization is great if you have big data structures and more than 2 cores in your computer CPU.
If you write it in CUDA it is guaranteed to run in parallel at the chip-level versus going with MATLAB's best guess for a non-parallel architecture and your best effort to get it to run in parallel.
Kind of like drinking fresh mountain water run-off versus buying filtered water. Go with the purist solution.

Resources