Difference between core and processor - cpu

What is the difference between a core and a processor?
I've already looked for it on Google, but I only get definitions for multi-core and multi-processor, which is not what I am looking for.

A core is usually the basic computation unit of the CPU - it can run a single program context (or multiple ones if it supports hardware threads such as hyperthreading on Intel CPUs), maintaining the correct program state, registers, and correct execution order, and performing the operations through ALUs. For optimization purposes, a core can also hold on-core caches with copies of frequently used memory chunks.
A CPU may have one or more cores to perform tasks at a given time. These tasks are usually software processes and threads that the OS schedules. Note that the OS may have many threads to run, but the CPU can only run X such tasks at a given time, where X = number cores * number of hardware threads per core. The rest would have to wait for the OS to schedule them whether by preempting currently running tasks or any other means.
In addition to the one or many cores, the CPU will include some interconnect that connects the cores to the outside world, and usually also a large "last-level" shared cache. There are multiple other key elements required to make a CPU work, but their exact locations may differ according to design. You'll need a memory controller to talk to the memory, I/O controllers (display, PCIe, USB, etc..). In the past these elements were outside the CPU, in the complementary "chipset", but most modern design have integrated them into the CPU.
In addition the CPU may have an integrated GPU, and pretty much everything else the designer wanted to keep close for performance, power and manufacturing considerations. CPU design is mostly trending in to what's called system on chip (SoC).
This is a "classic" design, used by most modern general-purpose devices (client PC, servers, and also tablet and smartphones). You can find more elaborate designs, usually in the academy, where the computations is not done in basic "core-like" units.

An image may say more than a thousand words:
* Figure describing the complexity of a modern multi-processor, multi-core system.
Source:
https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization

Let's clarify first what is a CPU and what is a core, a central processing unit CPU, can have multiple core units, those cores are a processor by itself, capable of execute a program but it is self contained on the same chip.
In the past one CPU was distributed among quite a few chips, but as Moore's Law progressed they made to have a complete CPU inside one chip (die), since the 90's the manufacturer's started to fit more cores in the same die, so that's the concept of Multi-core.
In these days is possible to have hundreds of cores on the same CPU (chip or die) GPUs, Intel Xeon. Other technique developed in the 90's was simultaneous multi-threading, basically they found that was possible to have another thread in the same single core CPU, since most of the resources were duplicated already like ALU, multiple registers.
So basically a CPU can have multiple cores each of them capable to run one thread or more at the same time, we may expect to have more cores in the future, but with more difficulty to be able to program efficiently.

CPU is a central processing unit. Since 2002 we have only single core processor i.e. we will only perform a single task or a program at a time.
For having multiple programs run at a time we have to use the multiple processor for executing multi processes at a time so we required another motherboard for that and that is very expensive.
So, Intel introduced the concept of hyper threading i.e. it will convert the single CPU into two virtual CPUs i.e we have two cores for our task. Now the CPU is single, but it is only pretending (masqueraded) that it has a dual CPU and performs multiple tasks. But having real multiple cores will be better than that so people develop making multi-core processor i.e. multiple processors on a single box i.e. grabbing a multiple CPU on single big CPU. I.e. multiple cores.

In the early days...like before the 90s...the processors weren't able to do multi tasks that efficiently...coz a single processor could handle just a single task...so when we used to say that my antivirus,microsoft word,vlc,etc. softwares are all running at the same time...that isn't actually true. When I said a processor could handle a single process at a time...I meant it. It actually would process a single task...then it used to pause that task...take another task...complete it if its a short one or again pause it and add it to the queue...then the next. But this 'pause' that I mentioned was so small (appx. 1ns) that you didn't understand that the task has been paused. Eg. On vlc while listening to music there are other apps running simultaneously but as I told you...one program at a time...so the vlc is actually pausing in between for ns so you dont underatand it but the music is actually stopping in between.
But this was about the old processors...
Now-a- days processors ie 3rd gen pcs have multi cored processors. Now the 'cores' can be compared to a 1st or 2nd gen processors itself...embedded onto a single chip, a single processor. So now we understood what are cores ie they are mini processors which combine to become a processor. And each core can handle a single process at a time or multi threads as designed for the OS. And they folloq the same steps as I mentioned above about the single processor.
Eg. A i7 6gen processor has 8 cores...ie 8 mini processors in 1 i7...ie its speed is 8x times the old processors. And this is how multi tasking can be done.
There could be hundreds of cores in a single processor
Eg. Intel i128.
I hope I explaned this well.

I have read all answers, but this link was more clear explanation for me about difference between CPU(Processor) and Core. So I'm leaving here some notes from there.
The main difference between CPU and Core is that the CPU is an electronic circuit inside the computer that carries out instruction to perform arithmetic, logical, control and input/output operations while the core is an execution unit inside the CPU that receives and executes instructions.

Intel's picture is helpful, as shown by Tortuga's best answer. Here's a caption for it.
Processor: One semiconductor chip, the CPU (central processing unit) seated in one socket, circa 1950s-2010s. Over time, more functions have been packed onto the CPU chip. Prior to the 1950s releases of single-chip processors, one processor might have spread across multiple chips. In the mid 2010s the system-on-a-chip chips made it slightly more sketchy to equate one processor to one chip, though that's generally what people mean by processor, as in "this computer has an i7 processor" or "this computer system has four processors."
Core: One block of a CPU, executing one instruction at a time. (You'll see people say one instruction per clock cycle, but some CPUs use multiple clock cycles for some instructions.)

Related

Why the time needed for executing code on "Hackerrank" site varies significantly?

When you try some coding challenges on www.hackerrank.com and you track the time also (e.g. in C++ using the "chrono" library), you will see every time you run the Code you will get a different time needed for execution at exactly the same Code. The variance I estimate at 10 to 30 percent. What is the reason that Code execution times are varying significantly at equal code? Which factors account to it?
It could be the Server System; but are there even physical reasons (stochastic processes in electronic components)?
Almost certainly just a busy system where your process competes for CPU time against other processes. (And competes for memory bandwidth even when it has a CPU to itself).
Also it's likely that the server runs on a CPU with SMT (Simultaneous multithreading) that uses each physical cores as two logical cores, so the "shared execution resources" that processes compete for includes stuff inside a single CPU core, not just L3 cache and memory bandwidth.
Intel calls their SMT tech "hyperthreading"; most servers these days run on Intel Xeon CPUs. AMD Zen also uses SMT, so either way unless the server admins disabled it, when the OS schedules tasks onto both logical cores of a single physical core, they will slow each other down some (with the slowdown amount mostly depending on their average IPC (instructions per cycle) - two threads that have a lot of branch mispredicts will typically not see much slowdown (so throughput nearly doubles). But two threads that could both nearly saturate the FP multiplier ALUs on their own will each run at nearly half speed (near zero gain in throughput).
OSes "know about" SMT / Hyperthreading and can detect which logical cores share a physical core. They try to avoid scheduling threads to the same physical core while there are some physical cores with both threads idle.
See also Modern Microprocessors
A 90-Minute Guide! which covers SMT, with the background to understand what execution resources are being shared.
but are there even physical reasons (stochastic processes in electronic components)?
No, CPUs are deterministic (except for the rdrand instruction which uses true analog electrical noise as a randomness source).
CPUs do use dynamic voltage and frequency scaling to save power (idle vs. max turbo, or somewhere in between if thermal limits don't allow max turbo.) https://en.wikipedia.org/wiki/Intel_Turbo_Boost / https://en.wikipedia.org/wiki/Dynamic_frequency_scaling
See also Idiomatic way of performance evaluation? for more microbenchmarking pitfalls / warm-up effects.

My program use only 25% of cpu power

My program with single thread uses only 25% of CPU with 2 cores (intel i5-3210M). Why not 50% (one core)? Program is being tested on macbook pro with windows 7 64. I think that problem is hyper-threading and because of this program uses only one logical core (25% of cpu power). How can I give more CPU power to my program?
It's important for me because this program works with big set of data and it takes about 30 hours to finish calculations.
It is expectable as you said with your CPU(which has 4 logical processors). You can search for the ways of transforming your program in order to use more than one threads. I can recommend you to search for "parallel programming", "concurrent programming","multi-threading". if you are using MS VC++ PPL library is so easy to use..OpenMP is a more prowerful tool which is available in Linux also. There are lots more ways and libraries for this issue but you need to choose it according to your OS, compiler, environment, programming language and your problem.
However, the easiest solution is to run it on a desktop machine with a better CPU and cross your fingers to get the results as quick as possible.
This program uses only one logical core (25% of cpu power). How can I give more CPU power to my programm? ...this programm works with big set of data ... it takes about 30 hours to finish calculations.
Divide up your data set into (at least) 4 separate pieces. With that much data, you want to think in terms of indexes into the data instead of copying data elements to 4 separate structures. Create a separate thread for each segment of your data, and have that thread only process one segment. You may need to set a processor affinity for your threads.
If the data streams, or must be processed in order, think in terms of queing elements for processing, where individual threads will then dequeue and process each item. This works well when the enqueue operation is relatively fast compared to processing an item, and can be done by a single master thread, while each dequeue/processing operation is more expensive.
Choosing the correct number of threads is tricky. Modern CPUs and operating systems are designed to switch tasks from time to time. This will always be an expensive operation, but the scheduler will want to do something else every so often, even if your process may seem like the best candidate. Therefore, you can often get the best throughput by overloading your CPUs to a small extent, such that you may want two or three threads per logical cpu. One way to manage this is through use the ThreadPool object.

Multicore thread processing order

I am having some real trouble finding this info online, im in Uni monday so i could use the library then but the soon the better. When a system has multicore processors, does each processor take a thread from the first process in the ready queue or does it take one from the first and one from the second? Also just to check, threads will be sent and fetched from the multicores concurrently by the OS right? If anyone could point me in the right direction resource wise, that would be great!
The key thing is to appreciate what the machine's architecture actually is.
A "core" is a CPU with cache with a connection to the system memory. Most machine architectures are Symmetric Multi-Processing, meaning that the system memory is equally accessible by all cores in the system.
Most operating systems run a scheduler thread on each core (Linux does). The scheduler has a list of threads it is responsible for, and it will run them to the best of its ability on the core that it controls. The rules it uses to choose which thread to run will be either round robin, or priority based, etc; ie all the normal scheduling rules. So far it is just like a scheduler that you would find in a single core computer. To some extent each scheduler is independent from all the other schedulers.
However, this an SMP environment, meaning that it really doesn't matter which core runs which thread. This is because all the cores can see all the memory, and all the code and data for all threads in the entire system is stored in that single memory.
So the schedulers talk amongst themselves to help each other out. Schedulers with too many threads to run can pass a thread over to a scheduler whose core is under utilised. They are load balancing within the machine. "Pass a thread over" means copying the data structure that describes the thread (thread id, which data, which code).
So that's about it. As the only communication between cores is via memory it all relies on an effective mutual exclusion semaphore system being available, which is something the hardware has to allow for.
The Difficulty
So I've painted a very simple picture, but in practice the memory is not perfectly symmetrical. SMP these days is synthesised on top of HyperTransport and QPI.
Long gone are the days when cores really did have equal access to the system memory at the electronic level. At the very lowest layer of their architecture AMD are purely NUMA, and Intel nearly so.
Nowadays a core has to send a request to other cores over a high speed serial link (HyperTransport or QPI) asking them to send data that they've got in their attached memory. Intel and AMD have done a good job of making it look convincingly like SMP in the general case, but it's not perfect. Data in memory attached to a different core takes longer to get hold of. It's insanely complex - the cores are now nodes on a network - but it's what they've had to do to get improved performance.
So schedulers take that into account when choosing which core should run which thread. They will try to place a thread on a core that is closest to the memory holding the data that the thread has access to.
The Future, Again
If the world's software ecosystem could be weaned off SMP the hardware guys would be able to save a lot of space on the silicon, and we would have faster more efficient systems. This has been done before; Transputers were a good attempt at a strictly NUMA architecture.
NUMA and Communicating Sequential Processes would today make it far easier to write multi threaded software that scales very easily and runs more efficiently than today's SMP shared memory behemoths.
SMP was in effect a cheap and nasty way of bringing together multiple cores, and the cost in terms of software development difficulties and inefficient hardware has been very high.

What is difference between 'Cores across processors' and 'Number of CPUs'?

E.g. Consider following is processor configuration of my machine:
Intel(R) Core(TM)i5 CPU 650 #3.20GHz (4 CPUs)
Then how should i find out how many 'Cores across processors' My machine have?
Is it the 4 cores[i.e. Number of CPU]?
I have referred following links but still i does not get clear idea:
http://www.ehow.com/how_6873203_do-number-core-processors-windows_.html
Can anyone please clear my doubt?
Cores across processors means nothing, or at least, nothing in particular, it's a generic and non-technical assumption/phrase with no exact meaning or no meaning at all.
According to Intel this CPU provides 2 physical cores with Hyper Threading and this mean that you get 4 logical cores or so called threads.
Hyper Threading is an Intel Technology that for each core provides 2 threads, so 2*2 = 4 threads.
I think that this is the closest answer to what you are asking here.
Let's clarify first what is a CPU and what is a core, a central processing unit CPU, can have multiple core units, those cores are a processor by itself, capable of execute a program but it is self contained on the same chip.
In the past one CPU was distributed among quite a few chips, but as Moore's progressed they made to have a complete CPU inside one chip (die), since the 90's the manufacturer's started to fit more cores in the same die, so that's the concept of Multi-core.
In these days is possible to have hundreds of cores on the same CPU (chip or die) GPU's, Intel Xeon. Other technique developed no the 90's was simultaneous multi-threading, basically they found that was possible to have another thread in the same single core CPU, since most of the resources were duplicated already like ALU, multiple registers.
So basically a CPU can have multiple cores each of them capable to run one thread or more at the same time, we may expect to have more cores in the future, but with more difficulty to be able to program efficiently.

Cores and hyperthreading

I'm writing an extremely optimized and CPU-intensive multithreaded code in C which performs a task in a more or less limited time space. During this time it does not venture out of its L1 cache except to load initial values and to store final results. So essentially this is a parallelized code which scales linearly for every core added. This is what happens on non-HT cores.
On my 2-core i5 with HT (which the BIOS does not allow to be disabled - this is an impractical solution anyway) I get an annoyingly dismal improvement when going from one core to two. My hypothesis is that the first thread runs alone on a core and the second shares the core with the first.
There are functions in the Windows API to retrieve info about available cores and HTs. But how do I make use of this information to ensure that there is only one thread on one hyperthread per core?
This article might be able to help:
http://msdn.microsoft.com/en-us/magazine/cc300701.aspx#S11
See the "CPU Affinity" section and the "Detecting Hyper-Threading" section.
The OS will be using the HT logical cores whether or not you are, and the upshot of that is that the cache is effectively halved in size. You can pin a thread to a logical core, but I suspect it won't help you. Your problem is the mere presence of HT. You do need to turn it off.

Resources