Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm researching the possibility to build a cluster of powerful machines geared towards HPC (FLOP) computation and therefore I have been reviewing the top Intel Xeon models and was surprise to discover that Xeon E7 models do not support AVX vectorization while the Xeon E5 do. The E7 on the other hand support SSE 4.2 that appears to be an optimization unrelated to FLOP computation and HPC but rather geared to speeding characters computation e.g. XML parsing.
To be sure I got the differences correctly I would like to ask whether this is the case E7 Xeon models do no support AVX and are geared towards "Systems" and E5 Xeon models support AVX and are geared towards HPC intensive FLOP computation.
OK I found a good report that answers my question:
Comparing the Intel E7-4780 (10 core, 2.4GHz) with an Intel E5-4650 (8 core, 2.7 GHz),
you’ll find that the E5 server outperforms against the E7 server in the
following benchmarks:
- CAE
- SPECfp*_rate_base2006
- Numerical Weather
- Financial Services
- Life Sciences
- Linpack AVX
- SPECint*_rate_base2006
The E7 server outperforms the E5 server in the following benchmarks:
- java* Middleware
- OLTP Database
- Enterprise Resource Planning
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I'm about to build a desktop computer and I'm trying to understand how are this PCIe lanes distributed. The goal is being able to calculate how many lanes do I need for a certain setup. I'm looking at the Asus Z170-P motherboard, which according the specifications [1]:
It contains the Z170 chipset.
You can read on the board that it is "CrossfireX Ready" which I believe implies you could plug in 2 graphic cards.
The specs say it has two PCIe x16 slots, one that works at x16 mode and another one that only works at x4 mode.
First, according to the Z170 chipset specifications, it supports up to 20 PCIe lanes. However, there is no single processor that fits into the LGA1151 socket with support for 20 or more PCIe lanes [2]. Why have a chipset with support for 20 lanes when the processor will only be able to handle up to 16?
Second, supported PCIe port configurations by the chipset are "1x16, 2x8, 1x8+2x4". If I were to plug in two graphic cards, would they both work at x4 mode or x8/x4 modes? Shouldn't a motherboard designed for using two graphic cards be able to handle 32+ PCIe lanes so both graphic cards work at x16 mode?
The (up to) 20 PCIe lanes from the Z170 are in addition to the 16 lanes that come directly out of the CPU.
I don't see any reason that it wouldn't run one graphics card at 16x and one at 4x. But it does seem odd to me that they call it "Crossfire-ready" without 2 x16 slots.
More info on the Z170 here:
http://www.tomshardware.com/reviews/skylake-intel-core-i7-6700k-core-i5-6600k,4252-2.html
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
A PC has a microprocessor which processes 16 instructions per microsecond. Each instruction is 64 bits long. Its memory can retrieve or store
data/instructions at a rate of 32 bits per microsecond.
Mention 3 options to upgrade system performance. Which option gives most improved performance?
And the answer provided is
a) upgrade processor to one with twice the speed
b) upgrade memory with one twice the speed
c) double clock speed
(b) gives most improved performance.
Overcoming the bottleneck of a PC can improve the integrated performance.
However, my problem is that I am not sure of why b gives the most improved performance. Additionally, would a and c give the same performance? Will it provide the same performance? Can it be calculated? I am not sure of how these different parts would work on the performance.
Your question's leading paragraph contains the necessary numbers to see why it's b):
The CPU's processing rate is fixed at 16 instructions per microsecond. So an instruction takes less than a microsecond to execute.
Each instruction is 64 bits long, but the memory system retrieves data at 32 bits per microsecond. So it takes two microseconds to retrieve a single instruction (i.e. 64 bits).
The bottleneck is clear: it takes longer to retrieve an instruction (2μs) than it does to execute it (1/16thμs).
If you increase the CPU speed (answer a)), the CPU will execute an individual instruction faster, but it will still be waiting idle at least 2μs for the next instruction to arrive, so the improvement is wasted.
To eliminate bottlenecks you need to increase the memory-system's speed to match the CPU's execution speed, so the memory needs to read 64 bits in a 1/16μs (or 32 bits in 1/32μs).
I assume answer c) refers to increasing the speed of some systemwide master clock which would also increase the CPU and Memory data-rates. This would improve performance, but the CPU would still be slaved to the memory speed.
Note that your question describes a simplistic computer. Computers were like this originally, where the CPU accessed memory directly, instruction-by-instruction. However as CPUs got faster, memory did not - so computer-engineers added cache levels: this is much faster memory (but much smaller in capacity) where instructions (and data memory) can be read as fast as a CPU can execute them, solving the bottleneck without needing to make all system memory match the CPU's speed.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I am currently volunteering to learn about linux servers and also I am interested in learning about cluster computing techniques.
In this lab, they have a small cluster with one head node and two compute nodes.
When I tried the lscpu command on head node, compute node1,node2. Click the link to view the details.
CPUs - 24 in head, computenode1 and computenode2. Is it referring to 24 physical CPUs in the motherboard?
Sockets - 2 in head, computenode1 and computenode2.Can anyone explain it?
Cores per socket - 6 in head, computenode1 and computenode2.Can anyone explain it?
Threads per core - 2 in head, computenode1 and computenode2.Can anyone explain it?
A socket is the physical socket where the physical CPU capsules are placed. A normal PC only have one socket.
Cores are the number of CPU-cores per CPU capsule. A modern standard CPU for a standard PC usually have two or four cores.
And some CPUs can run more than one parallel thread per CPU-core. Intel (the most common CPU manufacturer for standard PCs) have either one or two threads per core depending on CPU model.
If you multiply the number of socket, cores and threads, i.e. 2*6*2, then you get the number of "CPUs": 24. These aren't real CPUs, but the number of possible parallel threads of execution your system can do.
Just the fact that you have 6 cores is a sign you have a high-end workstation or server computer. The fact that you have two sockets makes it a very high-end computer. Usually not even high-end workstations have that these days, only servers.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Besides just running an infinite loop, are there any tricks (like maybe cache misses?) to making a CPU as hot as possible?
This could be architecture specific or not.
I have produced several reports on stress testing PCs via my [Free] reliability/burn-in tests for Windows and Linux. You can find the reports through Googling for “Roy Longbottom burn-in”.
What you need is a variety of programs that run at high speeds to test CPUs, caches and RAM. They should log and display speeds at reasonably short intervals with temperatures noted or preferably also logged. On running them, you can find which are the most effective. You can run multiple copies concurrently via BAT files for Windows or shell scripts with Linux, rather than relying on more complicated multithreading programs. You also really need programs that check for correct results of calculations. For system testing, one of the programs can use graphics. Here, for nVidia, CUDA programs are useful in producing high temperatures.
Following shows CPU core and case temperatures before and after blowing out the dust from the heatsink.
Following are results on a laptop, testing with data in L1 cache. These show variations in speed according to temperature. Other CPUs might be more affected by different data/instructions/which cache is used.
Overheating Core 2 Duo Laptop 1.83 GHz
Words 5K 5K
Ops/wd 2 32
Core MFLOPS Core MFLOPS
Minute °C x2 °C x2
0.0 65 65
0.5 96 4716 91 10168
1.0 98 3362 94 4756
1.5 91 2076 87 4443
2.0 87 2054 86 4452
2.5 85 2054 85 4235
3.0 84 2036 84 4237
3.5 82 3098 83 4376
4.0 89 4773 83 4420
You might also be interested in my Raspberry Pi tests (Cooking The Pi), where the RPI is overheated via a 60W lamp, to crash when overclocked and show speed variations that vary with temperature. Here, the CPU integrated graphics is the most demanding hardware.
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Stress%20Tests.htm#anchor7
Since modern CPUs are usually power aware, they have many builtin mechanisms to power down whole parts (cores on a multicore system, or even internal units within).
In order to really stress the CPU, you need to make sure you activate as many units as possible.
This can be most easily achieved through heavyloading your vector execution units if you have some, or floating point ones otherwise. On the memory side, try to constantly consume as much bandwidth as possible in order to stress the memory unit, caches and memory buses. Naturally, run your code on all available cores (if hyperthreading is available it might be a good idea to use that too)
This type of benchmarking is commonly known as a power virus. There are several examples over the web, you can look up cpuburn for e.g. (it has some sub flavors)
There are apps, and softwares that run the so called 'Stress Tests'.
It'll run tests to stress the CPU and any other hardware in order to see if it gives any errors and etc. They are used for stability purposes, as you can imagine, to make sure the CPU/device/system will be OK under heavy processing.
Just look up for stress test softwares, you'll find a bunch of them.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
How are GPUs more faster then CPUs? I've read articles that talk about how GPU's are much faster in breaking passwords than CPUs. If thats the case then why can't CPUs be designed in the same way as GPUs to be even in speed?
GPU get their speed for a cost. A single GPU core actually works much slower than a single CPU core. For example, Fermi GTX 580 has a core clock of 772MHz. You wouldn't want your CPU with such a low core clock nowadays...
The GPU however has several cores (up to 16) each operating in a 32-wide SIMD mode. That brings 500 operations done in parallel. Common CPUs however have up to 4 or 8 cores and can operate in 4-wide SIMD which gives much lower parallelism.
Certain types of algorithms (graphics processing, linear algebra, video encoding, etc...) can be easily parallelized on such a huge number of cores. Breaking passwords falls into that category.
Other algorithms however are really hard to parallelize. There is ongoing research in this area... Those algorithms would perform really badly if they were run on the GPU.
The CPU companies are now trying to approach the GPU parallelism without sacrificing the capability of running single-threaded programs. But the task is not an easy one. The Larabee project (currently abandoned) is a good example of the problems. Intel has been working on it for years but it is still not available on the market.
GPUs are designed with one goal in mind: process graphics really fast. Since this is the only concern they have, there have been some specialized optimizations in place that allow for certain calculations to go a LOT faster than they would in a traditional processor.
In the case of password cracking (or the molecular dynamic "folding at home" project) what has happened is that programmers have found ways of leveraging these optimized processes to do things like crunch passwords at a faster rate.
Your standard CPU has to do a lot more different calculation and processing types that what graphics processors do, so they can't be optimized in a similar manner.